简体   繁体   中英

Node frequency using networkx

I'm just learning python, so I appreciate the help. I have a two-column data set, the first is a unique id, and the second is a string of items. I'm using networkX to make a tree from the data (see below). I need to know the item frequency per level. For example, for the path in A (1,2,3,4), the counts for each node should be 1:4, 2:2, 3:2, and 4:2. How do I get the node count?

My data looks like this:

A      1, 2, 3, 4
B      1, 2, 1, 4
C      1, 3, 4, 3
D      1, 4, 3, 2

The code I have so far is the following:

#create graph
G = nx.MultiGraph()

#read in strings from csv
testfile = 'C:…file.txt'

with open(testfile, "r") as f:
    line = f.readline
    f = (i for i in f if '\t' in i.rstrip())
    for line in f:
        customerID, path = line.rstrip().split("\t")
        path2 =  path.rstrip("\\").rstrip("}").split(",")
        pathInt = list()
        for x in path2:
            if x is not None:
                newx = int(x)
                pathInt.append(newx)
                print(pathInt)
        varlength = len(pathInt)
        pathTuple = tuple(pathInt)
        G.add_path([pathTuple[:i+1] for i in range(0, varlength)])

nx.draw(G)
plt.show() # display

Firstly you can make the conversion from you string list to a int tuple a little bit more concise:

pathTuple = tuple(int(x) for x in path2 )
G.add_path([path[:i+1] for i in range(0, len(path))])

For storing the count data I would use a defaultdict in a defaultdict, basically a data structure that allows double indexing and then defaults to 0.

import collections
counts = collections.defaultdict(lambda:collections.defaultdict(lambda:0))

This can be used for this kind of access: counts[level][node] which we then can use to count how often each node appears on each level by looking at its position in the path.

After this your code would look like this:

#create graph
G = nx.MultiGraph()

#read in strings from csv
testfile = 'C:…file.txt'

with open(testfile, "r") as f:
    line = f.readline
    f = (i for i in f if '\t' in i.rstrip())
    for line in f:
        customerID, path = line.rstrip().split("\t")
        path2 =  path.rstrip("\\").rstrip("}").split(",")
        pathTuple = tuple(int(x) for x in path2 )
        G.add_path([pathTuple[:i+1] for i in range(0, len(pathTuple))])

        for level, node in enumerate(path):
            counts[level][node]+=1

And you can then do this:

level = 0
node = 1
print 'Node', node, 'appears', counts[level][node], 'times on level', level
>>> Node 1 appears 4 times on level 0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM