Node frequency using networkx

Question

I'm just learning python, so I appreciate the help. I have a two-column data set, the first is a unique id, and the second is a string of items. I'm using networkX to make a tree from the data (see below). I need to know the item frequency per level. For example, for the path in A (1,2,3,4), the counts for each node should be 1:4, 2:2, 3:2, and 4:2. How do I get the node count?

My data looks like this:

A      1, 2, 3, 4
B      1, 2, 1, 4
C      1, 3, 4, 3
D      1, 4, 3, 2

The code I have so far is the following:

#create graph
G = nx.MultiGraph()

#read in strings from csv
testfile = 'C:…file.txt'

with open(testfile, "r") as f:
    line = f.readline
    f = (i for i in f if '\t' in i.rstrip())
    for line in f:
        customerID, path = line.rstrip().split("\t")
        path2 =  path.rstrip("\\").rstrip("}").split(",")
        pathInt = list()
        for x in path2:
            if x is not None:
                newx = int(x)
                pathInt.append(newx)
                print(pathInt)
        varlength = len(pathInt)
        pathTuple = tuple(pathInt)
        G.add_path([pathTuple[:i+1] for i in range(0, varlength)])

nx.draw(G)
plt.show() # display

Answer 1

Firstly you can make the conversion from you string list to a int tuple a little bit more concise:

pathTuple = tuple(int(x) for x in path2 )
G.add_path([path[:i+1] for i in range(0, len(path))])

For storing the count data I would use a defaultdict in a defaultdict, basically a data structure that allows double indexing and then defaults to 0.

import collections
counts = collections.defaultdict(lambda:collections.defaultdict(lambda:0))

This can be used for this kind of access: counts[level][node] which we then can use to count how often each node appears on each level by looking at its position in the path.

After this your code would look like this:

#create graph
G = nx.MultiGraph()

#read in strings from csv
testfile = 'C:…file.txt'

with open(testfile, "r") as f:
    line = f.readline
    f = (i for i in f if '\t' in i.rstrip())
    for line in f:
        customerID, path = line.rstrip().split("\t")
        path2 =  path.rstrip("\\").rstrip("}").split(",")
        pathTuple = tuple(int(x) for x in path2 )
        G.add_path([pathTuple[:i+1] for i in range(0, len(pathTuple))])

        for level, node in enumerate(path):
            counts[level][node]+=1

And you can then do this:

level = 0
node = 1
print 'Node', node, 'appears', counts[level][node], 'times on level', level
>>> Node 1 appears 4 times on level 0

Node frequency using networkx

Question

1 answers

solution1
0 ACCPTED 2012-08-08 19:59:48

Node frequency using networkx

Question

1 answers

solution1 0 ACCPTED 2012-08-08 19:59:48

solution1
0 ACCPTED 2012-08-08 19:59:48