[英]Create Adjacency Matrix in Python for large Dataset
I have a problem with representing website user behaviour in a Adjacency Matrix in Python. 我在用Python的邻接矩阵表示网站用户行为时遇到问题。 I want to analyze the user interaction between 43 different websites to see which websites are used together. 我想分析43个不同网站之间的用户互动,以了解哪些网站一起使用。
The given data set has about 13.000.000 lines with following structure: 给定的数据集大约有13.000.000行,结构如下:
user website
id1 web1
id1 web2
id1 web2
id2 web1
id2 web2
id3 web3
id3 web2
I would like to visualize the interactions between the website in a Adjacency Matrix like this: 我想在邻接矩阵中可视化网站之间的交互,如下所示:
web1 web2 web3
web1 2 2 0
web2 2 4 1
web3 0 1 1
I'm happy for any advice 我很高兴有任何建议
import scipy.sparse
data = """
id1 web1
id1 web2
id1 web2
id2 web1
id2 web2
id3 web3
id3 web2
"""
data = np.array(data.split()).reshape(-1, 2)
_, i = np.unique(data[:, 0], return_inverse=True)
_, j = np.unique(data[:, 1], return_inverse=True)
incidence = scipy.sparse.coo_matrix((np.ones_like(i), (i,j)))
adjecency = incidence.T * incidence
print(adjecency.todense())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.