I have a big (half a million edges) weighted graph (not directional) and I want to find the distance between two nodes u and v. I could use my_graph.shortest_paths(u, v, weights='length')
to get the distance. However, this is really slow.
I can also first find the path and then calculate the length of it. This is fast, but I don't understand why this is faster than calculating the length directly.
In networkx I used nx.shortest_path_length(my_graph u, v, weight='length')
I used this code to figure out the speed. For anyone who wants to run the code, I put the edgelist on Google drive here
import pandas as pd
import networkx as nx
import igraph
import time
# load edgelist
edgelist = pd.read_pickle('edgelist.pkl')
# create igraph
tuples = [tuple(x) for x in edgelist[['u', 'v', 'length']].values]
graph_igraph = igraph.Graph.TupleList(tuples, directed=False, edge_attrs=['length'])
# create nx graph
graph_nx = nx.from_pandas_edgelist(edgelist, source='u', target='v', edge_attr=True)
def distance_shortest_path(u, v):
return graph_igraph.shortest_paths(u, v, weights='length')[0]
get_length = lambda edge: graph_igraph.es[edge]['length']
def distance_path_then_sum(u, v):
path = graph_igraph.get_shortest_paths(u, v, weights='length', output='epath')[0]
return sum(map(get_length, path))
def distance_nx(u, v):
return nx.shortest_path_length(graph_nx, u, v, weight='length')
some_nodes = [
'Delitzsch unt Bf',
'Neustadt(Holst)Gbf',
'Delitzsch ob Bf',
'Karlshagen',
'Berlin-Karlshorst (S)',
'Köln/Bonn Flughafen',
'Mannheim Hbf',
'Neu-Edingen/Friedrichsfeld',
'Ladenburg',
'Heddesheim/Hirschberg',
'Weinheim-Lützelsachsen',
'Wünsdorf-Waldstadt',
'Zossen',
'Dabendorf',
'Rangsdorf',
'Dahlewitz',
'Blankenfelde(Teltow-Fläming)',
'Berlin-Schönefeld Flughafen',
'Berlin Ostkreuz',
]
print('distance_shortest_path ', end='')
start = time.time()
for node in some_nodes:
distance_shortest_path('Köln Hbf', node)
print('took', time.time() - start)
print('distance_nx ', end='')
start = time.time()
for node in some_nodes:
distance_nx('Köln Hbf', node)
print('took', time.time() - start)
print('distance_path_then_sum ', end='')
start = time.time()
for node in some_nodes:
distance_path_then_sum('Köln Hbf', node)
print('took', time.time() - start)
Which results in
distance_shortest_path took 46.34037733078003
distance_nx took 12.006148099899292
distance_path_then_sum took 0.9555535316467285
You can use the shortest_paths
function for this in igraph
. Using is quite straightforward, suppose that G
is your graph, with G.es['weight']
edge weights, then
D = G.shortest_paths(weights='weight'))
will give you an igraph
matrix D
. You can convert this to a numpy
array as
D = np.array(list(D))
To obtain the distance between only a specific pair of (sets of) nodes, you can specify the source
and target
arguments of shortest_paths
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.