简体   繁体   English

NetworkX csv边列表结构

[英]NetworkX csv edgelist structure

Is there a standard structure for adding edges from a csv/txt into NetworkX? 是否存在用于将csv / txt中的边添加到NetworkX中的标准结构? I've read the docs and have tried using read_edgelist('path.csv') and add_edges_from('path.csv') but have received errors saying my data cannot be converted into dictionaries, and also "Edge tuple C be a 2-tuple or a 3-tuple". 我已经阅读了文档,并尝试使用read_edgelist('path.csv')add_edges_from('path.csv')但是收到错误消息,说我的数据无法转换成字典,并且“ Edge元组C为2元组或3元组”。 I've reformatted a sample of my data several ways to test different structures including lists of lists and lists of tuples, removing white space and also creating a single list of numbers in each row, but no luck. 我已经以多种方式对数据样本进行了重新格式化,以测试不同的结构,包括列表列表和元组列表,删除空格以及在每行中创建一个数字列表,但是没有运气。 Below is some sample data of mine: 以下是我的一些示例数据:

user_id,cluster_moves
11011,"[[86, 110], [110, 110]]"
2139671,"[[89, 125]]"
3945641,"[[36, 73], [73, 110], [110, 110]]"
10024312,"[[123, 27], [27, 97], [97, 97], [97, 97], [97,110]]"
14270422,"[[0, 110], [110, 174]]"
14283758,"[[110, 184]]"
14373703,"[[35, 97], [97, 97], [97, 97], [97, 17], [17,58]]"

The purpose is to create a network graph of trajectories moving between (or within) clusters. 目的是创建在群集之间(或群集内部)移动的轨迹的网络图。 Each list is a move either within a cluster, or between a cluster, eg, [[0, 110], [110,174]] is a move from clusters 0->110->174 . 每个列表是在集群内或集群之间的移动,例如[[0, 110], [110,174]]是从集群0->110->174 Is there a way to format my data such that networkx might be able to read it? 有没有一种方法可以格式化我的数据,以便networkx能够读取它?

Quick sample code I was testing data with: 我正在使用以下数据测试数据的快速示例代码:

import networkx as nx
import matplotlib.pyplot as plt

g = nx.Graph()
edges = g.add_edges_from('path.csv')

nx.draw(g)
plt.draw
plt.show()

Edit 编辑

Is it possible to add edge weights to this data structure when reading in networkx , and then adjust the weight based on the count/frequency of an edge? networkx读取时,是否可以向该数据结构添加边缘权重,然后根据边缘的计数/频率调整权重? I would like to do this so I can visualize edges that have a higher frequency/count as another color/line weight. 我想这样做,以便可以将具有较高频率/计数的边缘可视化为另一种颜色/线条粗细。 Using the answer below, I have tried using g.add_weighted_edges_from() and using weight=1 as an attribute instead of using g.add_edges_from() , but this did not work properly. 使用以下答案,我尝试使用g.add_weighted_edges_from()并使用weight=1作为属性,而不是使用g.add_edges_from() ,但这无法正常工作。 I also tried using this with no luck: 我也尝试过使用它,但没有运气:

for u,v,d in g.edges():
    d['weight'] = 1
g.edges(data=True)
edges = g.edges()
weights = [g[u][v]['weight'] for u,v in edges]

First of all, your data is not valid csv file, from Comma separated values 首先,您的数据不是有效的csv文件,以逗号分隔的值

Fields with embedded commas or double-quote characters must be quoted. 带有嵌入式逗号或双引号字符的字段必须用引号引起来。

Which means you should use double-quote to quote your list: 这意味着您应该使用双引号来引用列表:

user_id,cluster_moves
11011,"[[86, 110], [110, 110]]"
2139671,"[[89, 125]]"
3945641,"[[36, 73], [73, 110], [110, 110]]"
10024312,"[[123, 27], [27, 97], [97, 97], [97, 97], [97,110]]"
14270422,"[[0, 110], [110, 174]]"
14283758,"[[110, 184]]"
14373703,"[[35, 97], [97, 97], [97, 97], [97, 17], [17,58]]"

And you can use csv module to read this file, and then convert the string to list by using eval() and create a network graph with add_edges_from : 然后,您可以使用csv模块读取此文件,然后使用eval()将字符串转换为list并使用add_edges_from创建网络图:

import csv
import networkx as nx
import matplotlib.pyplot as plt

g = nx.Graph()
for row in csv.reader(open('ooo.csv', 'r')):
    if '[' in row[1]:       #
        g.add_edges_from(eval(row[1]))

nx.draw(g)
plt.draw
plt.show()

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM