简体   繁体   English

Python:如何从csv文件创建图形节点和边?

[英]Python: How to create graph nodes and edges from csv file?

I have a csv data file where each row denotes one event. 我有一个csv数据文件,其中每一行表示一个事件。 A simplified example would be, 一个简化的例子是

  Datetime                 ColA     ColB    ColC   
  2015/07/12 08:45:34      ABC       12      
  2015/07/12 08:46:04      DCD       10     ABC 
  2015/07/12 08:46:23      XYZ       34     ABC 
  2015/07/12 08:46:56      MNO       10     XYZ
  2015/07/12 08:46:56      FGH       20     

So, each row will be a node with properties denoted by values of each columns Datetime, ColA, ColB, ColC . 因此,每一行将是一个node ,其属性由每一列Datetime, ColA, ColB, ColC值表示。 Each of these nodes is connected by a relationship between ColA and ColC . 这些节点中的每一个都通过ColAColC之间的关系连接。

So, in this example there is an edge from row 1 to row 2 and row 3, since ColC of the latter rows is equal to ColA of the first row. 因此,在这个示例中,存在从第1行的边缘到行2和行3中,由于ColC后者行等于ColA第一行的。 Row 3 and row 4 are similarly connected by an directed edge. 第3行和第4行通过有向边类似地连接。

Row 1 has no ColC so it is not connected to any nodes higher up, so is row 5. 第1行没有ColC,因此它没有连接到更高的任何节点,第5行也没有连接。

How can I create a graph data structure to create this relationship in Python? 如何创建图形数据结构以在Python中创建此关系? They should all be ordered chronologically, and in case there are two rows with ColA that matches a rows ColC , the one closer in time is chosen. 它们都应按时间顺序排序,并且如果有两行ColA与一行ColC相匹配,则选择时间上更接近的ColC行。

You could build a bipartite graph with the datetime as one part and the ColA/ColC values as the other. 您可以构建一个二部图,其中日期时间为一部分,而ColA / ColC值为另一部分。 Then "project" the graph onto the datetime nodes - create a link between two datetimes if they both link to a ColA/ColC node. 然后将图形“投影”到日期时间节点上-如果两个日期时间都链接到ColA / ColC节点,则在两个日期时间之间创建链接。

Here is some code that shows one way to do that to create an undirected graph. 这是一些代码,显示了创建无向图的一种方法。 I didn't understand what the directions meant in your example. 我不明白您的示例中的指示含义。

import csv
import StringIO
import networkx as nx
from networkx.algorithms import bipartite

data ="""Datetime,ColA,ColB,ColC
2015/07/12 08:45:34,ABC,12,
2015/07/12 08:46:04,DCD,10,ABC
2015/07/12 08:46:23,XYZ,34,ABC
2015/07/12 08:46:56,MNO,10,XYZ
2015/07/12 08:46:56,FGH,20,"""

G = nx.Graph()
csvfile = StringIO.StringIO(data)
reader = csv.DictReader(csvfile)
nodes = []
for row in reader:
    nodes.append(row['Datetime'])
    G.add_node(row['Datetime'])
    if row['ColA'] != '':
        G.add_edge(row['Datetime'],row['ColA'])
    if row['ColC'] != '':
        G.add_edge(row['Datetime'],row['ColC'])
print G.edges()
B = bipartite.projected_graph(G, nodes)
print B.edges()

OUTPUT OUTPUT

[('2015/07/12 08:46:23', 'XYZ'), ('2015/07/12 08:46:23', 'ABC'), ('ABC', '2015/07/12 08:46:04'), ('ABC', '2015/07/12 08:45:34'), ('DCD', '2015/07/12 08:46:04'), ('FGH', '2015/07/12 08:46:56'), ('2015/07/12 08:46:56', 'XYZ'), ('2015/07/12 08:46:56', 'MNO')]
[('2015/07/12 08:46:23', '2015/07/12 08:46:04'), ('2015/07/12 08:46:23', '2015/07/12 08:46:56'), ('2015/07/12 08:46:23', '2015/07/12 08:45:34'), ('2015/07/12 08:46:04', '2015/07/12 08:45:34')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM