Below is a who-trusts-whom network of people who trade using Bitcoin on a platform called Bitcoin OTC. Since Bitcoin users are anonymous, there is a need to maintain a record of users' reputation to prevent transactions with fraudulent and risky users. Members of Bitcoin OTC rate other members in a scale of -10 (total distrust) to +10 (total trust) in steps of 1.
Let's see if we can produce some meaningful insights by visualizing a snapshot of the network.
Dataset statistics¶
- Nodes 5,881
- Edges 35,592
- Range of edge weight -10 to +10
- Percentage of positive edges 89%
Sampling¶
One month sample is used with a broad range of ratings.
Visualization¶
Uses a multi-edged directed graph to display ratings. Fruchterman Reingold force-directed algorithm was used for layout.
- Larger nodes have more reviews
- Green nodes are positively reviewed on average
- Red nodes are negatively reviewed on average
- Color intensity is degree of trust (positive or negative)
- Edges are directed
- Longer edges are more negative, shorter edges are more positive
Data format¶
Weighted Signed Directed Bitcoin OTC web of trust network
http://snap.stanford.edu/data/soc-sign-bitcoinotc.html
Each line has one rating, sorted by time, with the following format:
SOURCE, TARGET, RATING, TIME
- SOURCE: node id of source, i.e., rater
- TARGET: node id of target, i.e., ratee
- RATING: the source's rating for the target, ranging from -10 to +10 in steps of 1
- TIME: the time of the rating, measured as seconds since Epoch. (This can be converted to human readable data easily as described here)
Download Data and Parse¶
import os
import requests
import pandas as pd
url = 'https://snap.stanford.edu/data/soc-sign-bitcoinotc.csv.gz'
fname = os.path.basename(url)
if not os.path.isfile(fname):
print(f"Downloading...\n{url}")
r = requests.get(url, allow_redirects=True)
open(fname, 'wb').write(r.content)
else:
print(f"Already downloaded:\n{url}")
# parse the file
cols = ['source', 'target', 'rating', 'time']
df = pd.read_csv(fname, names=cols, header=None)
df['time'] = pd.to_datetime(df.time * 1e9)
df = df.set_index('time')
df.shape
Subsampling¶
What's the distribution of values look like month-to-month? Can we subsample to visualize a reasonable number of nodes?
# ratings overview by month
df.resample('M').rating.value_counts().unstack().sample(5)
# later adoption years have a broader range of values
# this month has a good spread of ratings
df_ = df.loc['2014-03']
print(f"Ratings: {len(df_)}")
df_.sample(10)
NetworkX¶
NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
Multiedged Directed Graph¶
A directed graph class that can store multiedges. Multiedges are multiple edges between two nodes. Each edge can hold optional data or attributes.
https://networkx.github.io/documentation/stable/reference/classes/multidigraph.html
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import networkx as nx
import math
DG = nx.MultiDiGraph()
edges = [(t.source, t.target, float(t.rating)) for t in df_.itertuples()]
DG.add_weighted_edges_from(edges)
print(f"Nodes: {DG.number_of_nodes()}")
print(f"Edges: {DG.number_of_edges()}")
Visualization¶
- Node size: number of transactions / reviews
- Node color: average positivity or negativity of node's reviews
- Edge length: the mistrust of a single review, closer is more trusted
- Position: Fruchterman-Reingold force-directed algorithm
We expect trusted nodes to transact more with each other, appear larger, and have on average more positive reviews and thusshorter edges. These should cluster together.
Nodes with lower trust should appear smaller and have longer edges. With the force-directed algorithm, these should be pushed away from trusted clusters.
# build a lookup of review counts and average trust rating
review_counts = df_.groupby('target').rating.count()
average_reviews = df_.groupby('target').rating.mean()
def get_size(user_id, m=200, min=300):
s = review_counts.get(user_id)
if s is not None:
return min * s
else:
return min
def get_color(user_id):
s = average_reviews.get(user_id)
if s is None:
return 0.5
# need a scaling function to translate -10-10 to 0.0-1.0 for colors maps
return np.interp(s, (-10, 10), (0, 1))
plt.figure(figsize=(14, 14))
plt.title("Bitcoin Who-Trusts-Whom Network (1-Month Sample)", fontsize=18)
untrusted = mpatches.Patch(color='red', label='Average Negative Reputation')
trusted = mpatches.Patch(color='green', label='Average Positive Reputation')
size = mpatches.Patch(color='white', label='Node Size - Rating Count')
edges = mpatches.Patch(color='white', label='Edge Length - Mistrust of Reviewing Node')
plt.legend(handles=[trusted, untrusted, size, edges], loc='lower right')
pos = nx.spring_layout(DG, k=0.25)
sizes = [get_size(n) for n in DG]
colors = [get_color(n) for n in DG]
nc = nx.draw_networkx_nodes(
DG, pos, nodelist=DG.nodes(), node_size=sizes, linewidths=2.0,
node_color=colors, cmap=plt.cm.RdYlGn, alpha=0.8
)
ec = nx.draw_networkx_edges(DG, pos, arrows=True, alpha=0.08)
ax = plt.axis('off')
plt.show()
Observations¶
- Larger nodes appear to cluster together as expected
- It seems with more ratings, there's less likelihood to maintain a high average rating
- Highly negatively rated nodes (dark red) appear to be pushed to the perimeter and away from positive / high transaction clusters
- A few nodes have consistently been negatively rated (large red)
Areas to explore¶
- It would be interesting to see how this changes month-to-month and track nodes by id
- A better visualization might not have nodes overlapping, or use an algorithm for bundling edges as shown here: http://holoviews.org/user_guide/Network_Graphs.html
References¶
- Stanford Network Analysis Project, Bitcoin OTC trust weighted signed network
- S. Kumar, F. Spezzano, V.S. Subrahmanian, C. Faloutsos. Edge Weight Prediction in Weighted Signed Networks. IEEE International Conference on Data Mining (ICDM), 2016.
- NetworkX Tutorial, Multigraphs
- Fruchterman, T. M. J., & Reingold, E. M. (1991). Graph Drawing by Force-Directed Placement. Software: Practice and Experience, 21(11).
- HoloViews, Creating interactive network graphs