I have a csv data file where each row denotes one event. A simplified example would be,
Datetime ColA ColB ColC
2015/07/12 08:45:34 ABC 12
2015/07/12 08:46:04 DCD 10 ABC
2015/07/12 08:46:23 XYZ 34 ABC
2015/07/12 08:46:56 MNO 10 XYZ
2015/07/12 08:46:56 FGH 20
So, each row will be a node
with properties denoted by values of each columns Datetime, ColA, ColB, ColC
. Each of these nodes is connected by a relationship between ColA
and ColC
.
So, in this example there is an edge from row 1 to row 2 and row 3, since ColC
of the latter rows is equal to ColA
of the first row. Row 3 and row 4 are similarly connected by an directed edge.
Row 1 has no ColC so it is not connected to any nodes higher up, so is row 5.
How can I create a graph data structure to create this relationship in Python? They should all be ordered chronologically, and in case there are two rows with ColA
that matches a rows ColC
, the one closer in time is chosen.
You could build a bipartite graph with the datetime as one part and the ColA/ColC values as the other. Then "project" the graph onto the datetime nodes - create a link between two datetimes if they both link to a ColA/ColC node.
Here is some code that shows one way to do that to create an undirected graph. I didn't understand what the directions meant in your example.
import csv
import StringIO
import networkx as nx
from networkx.algorithms import bipartite
data ="""Datetime,ColA,ColB,ColC
2015/07/12 08:45:34,ABC,12,
2015/07/12 08:46:04,DCD,10,ABC
2015/07/12 08:46:23,XYZ,34,ABC
2015/07/12 08:46:56,MNO,10,XYZ
2015/07/12 08:46:56,FGH,20,"""
G = nx.Graph()
csvfile = StringIO.StringIO(data)
reader = csv.DictReader(csvfile)
nodes = []
for row in reader:
nodes.append(row['Datetime'])
G.add_node(row['Datetime'])
if row['ColA'] != '':
G.add_edge(row['Datetime'],row['ColA'])
if row['ColC'] != '':
G.add_edge(row['Datetime'],row['ColC'])
print G.edges()
B = bipartite.projected_graph(G, nodes)
print B.edges()
OUTPUT
[('2015/07/12 08:46:23', 'XYZ'), ('2015/07/12 08:46:23', 'ABC'), ('ABC', '2015/07/12 08:46:04'), ('ABC', '2015/07/12 08:45:34'), ('DCD', '2015/07/12 08:46:04'), ('FGH', '2015/07/12 08:46:56'), ('2015/07/12 08:46:56', 'XYZ'), ('2015/07/12 08:46:56', 'MNO')]
[('2015/07/12 08:46:23', '2015/07/12 08:46:04'), ('2015/07/12 08:46:23', '2015/07/12 08:46:56'), ('2015/07/12 08:46:23', '2015/07/12 08:45:34'), ('2015/07/12 08:46:04', '2015/07/12 08:45:34')]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.