简体   繁体   中英

Networkx Python Edges Comparison

I have been trying to build a graph for a project and I have been trying to identify newly added edges after populating it with more information.

For instance below you can see its first and second iteration:

---------------------- General Info Graph H-----------------------------

Total number of Nodes in Graph:  2364
Total number of Edges:  3151

---------------------- General Info Graph G -----------------------------

Total number of Nodes in Graph:  6035
Total number of Edges:  11245

The problem I have been facing is when I try to identify newly added edges using the code:

counter = 0
edges_all = list(G.edges_iter(data=True)) 
edges_before = list(H.edges_iter(data=True)) 
print "How many edges in old graph: ", len(edges_before)
print "How many edges in new graph: ", len(edges_all)
edge_not_found = []
for edge in edges_all:
    if edge in edges_before:
        counter += 1
    else:
        edge_not_found.append(edge)
print "Edges found: ", counter
print "Not found: ", len(edge_not_found)

And I have been getting these results:

How many edges in old graph:  3151
How many edges in new graph:  11245
Edges found:  1601
Not found:  9644

I can't understand why I am getting 1601 found instead of 11245-3151 = 8094

Any ideas?

Thank you!

TL/DR: There's a simple explanation for what you see, and if you get to the end, there is a much shorter way to write your code (with a lot of explanation along the way).


First note that it looks like Edges found is intended to be the number of edges that are in both H and G . So it should only have 3151, not 8094. 8094 should be Not found . Note that the number of edges found, 1601, is about half the number that you would expect. That makes sense because:

I believe the problem you are having is that when networkx lists out the edges an edge might appear as (a,b) in edges_before . However in edges_after , it might appear in the list as (b,a) .

So (b,a) will not be in edges_before . It will fail your test. Assuming the edge orders aren't correlated between when they are listed for H and G , you'd expect to find about half of them pass. You can do a different test to see if (b,a) is an edge of H . This is H.has_edge(b,a)

A straightforward improvement:

for edge in edges_all:
    if H.has_edge(edge[0],edge[1]):
        counter += 1
    else:
        edge_not_found.append(edge)

This lets you avoid even defining edges_before .

You can also avoid defining edges_all through a better improvement:

for edge in G.edges_iter(data=True):
    if H.has_edge(edge[0],edge[1]):
        etc

Note: I've written it as H.has_edge(edge[0],edge[1]) to make clear what's happening. A more sophisticated way to write it is H.has_edge(*edge) . The *edge notation unpacks the tuple .

Finally, using a list comprehension gives a better way to get edge_not_found:

edge_not_found = [edge for edge in G.edges_iter(data=True) if not H.has_edge(*edge)]

This creates a list made up of edge s which are in G but not in H .

Putting this all together (and using the .size() command to count edges in a network), we arrive at a cleaner version:

print "How many edges in old graph: ", H.size()
print "How many edges in new graph: ", G.size()
edge_not_found = [edge for edge in G.edges_iter(data=True) if not H.has_edge(*edge)]
print "Not found: ", len(edge_not_found)
print "Edges found: ", G.size()-len(edge_not_found)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM