I have been trying to build a graph for a project and I have been trying to identify newly added edges after populating it with more information.
For instance below you can see its first and second iteration:
---------------------- General Info Graph H-----------------------------
Total number of Nodes in Graph: 2364
Total number of Edges: 3151
---------------------- General Info Graph G -----------------------------
Total number of Nodes in Graph: 6035
Total number of Edges: 11245
The problem I have been facing is when I try to identify newly added edges using the code:
counter = 0
edges_all = list(G.edges_iter(data=True))
edges_before = list(H.edges_iter(data=True))
print "How many edges in old graph: ", len(edges_before)
print "How many edges in new graph: ", len(edges_all)
edge_not_found = []
for edge in edges_all:
if edge in edges_before:
counter += 1
else:
edge_not_found.append(edge)
print "Edges found: ", counter
print "Not found: ", len(edge_not_found)
And I have been getting these results:
How many edges in old graph: 3151
How many edges in new graph: 11245
Edges found: 1601
Not found: 9644
I can't understand why I am getting 1601 found instead of 11245-3151 = 8094
Any ideas?
Thank you!
TL/DR: There's a simple explanation for what you see, and if you get to the end, there is a much shorter way to write your code (with a lot of explanation along the way).
First note that it looks like Edges found
is intended to be the number of edges that are in both H
and G
. So it should only have 3151, not 8094. 8094 should be Not found
. Note that the number of edges found, 1601, is about half the number that you would expect. That makes sense because:
I believe the problem you are having is that when networkx lists out the edges an edge might appear as (a,b)
in edges_before
. However in edges_after
, it might appear in the list as (b,a)
.
So (b,a)
will not be in edges_before
. It will fail your test. Assuming the edge orders aren't correlated between when they are listed for H
and G
, you'd expect to find about half of them pass. You can do a different test to see if (b,a)
is an edge of H
. This is H.has_edge(b,a)
A straightforward improvement:
for edge in edges_all:
if H.has_edge(edge[0],edge[1]):
counter += 1
else:
edge_not_found.append(edge)
This lets you avoid even defining edges_before
.
You can also avoid defining edges_all
through a better improvement:
for edge in G.edges_iter(data=True):
if H.has_edge(edge[0],edge[1]):
etc
Note: I've written it as H.has_edge(edge[0],edge[1])
to make clear what's happening. A more sophisticated way to write it is H.has_edge(*edge)
. The *edge
notation unpacks the tuple .
Finally, using a list comprehension gives a better way to get edge_not_found:
edge_not_found = [edge for edge in G.edges_iter(data=True) if not H.has_edge(*edge)]
This creates a list made up of edge
s which are in G
but not in H
.
Putting this all together (and using the .size()
command to count edges in a network), we arrive at a cleaner version:
print "How many edges in old graph: ", H.size()
print "How many edges in new graph: ", G.size()
edge_not_found = [edge for edge in G.edges_iter(data=True) if not H.has_edge(*edge)]
print "Not found: ", len(edge_not_found)
print "Edges found: ", G.size()-len(edge_not_found)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.