简体   繁体   中英

Distance between two vertices in igraph

I have a big (half a million edges) weighted graph (not directional) and I want to find the distance between two nodes u and v. I could use my_graph.shortest_paths(u, v, weights='length') to get the distance. However, this is really slow.

I can also first find the path and then calculate the length of it. This is fast, but I don't understand why this is faster than calculating the length directly.

In networkx I used nx.shortest_path_length(my_graph u, v, weight='length')

I used this code to figure out the speed. For anyone who wants to run the code, I put the edgelist on Google drive here

import pandas as pd
import networkx as nx
import igraph
import time

# load edgelist
edgelist = pd.read_pickle('edgelist.pkl')

# create igraph
tuples = [tuple(x) for x in edgelist[['u', 'v', 'length']].values]
graph_igraph = igraph.Graph.TupleList(tuples, directed=False, edge_attrs=['length'])

# create nx graph
graph_nx = nx.from_pandas_edgelist(edgelist, source='u', target='v', edge_attr=True)


def distance_shortest_path(u, v):
    return graph_igraph.shortest_paths(u, v, weights='length')[0]

get_length = lambda edge: graph_igraph.es[edge]['length']
def distance_path_then_sum(u, v):
    path = graph_igraph.get_shortest_paths(u, v, weights='length', output='epath')[0]
    return sum(map(get_length, path))

def distance_nx(u, v):
    return nx.shortest_path_length(graph_nx, u, v, weight='length')


some_nodes = [
    'Delitzsch unt Bf',
    'Neustadt(Holst)Gbf',
    'Delitzsch ob Bf',
    'Karlshagen',
    'Berlin-Karlshorst (S)',
    'Köln/Bonn Flughafen',
    'Mannheim Hbf',
    'Neu-Edingen/Friedrichsfeld',
    'Ladenburg',
    'Heddesheim/Hirschberg',
    'Weinheim-Lützelsachsen',
    'Wünsdorf-Waldstadt',
    'Zossen',
    'Dabendorf',
    'Rangsdorf',
    'Dahlewitz',
    'Blankenfelde(Teltow-Fläming)',
    'Berlin-Schönefeld Flughafen',
    'Berlin Ostkreuz',
]

print('distance_shortest_path ', end='')
start = time.time()
for node in some_nodes:
    distance_shortest_path('Köln Hbf', node)
print('took', time.time() - start)

print('distance_nx ', end='')
start = time.time()
for node in some_nodes:
    distance_nx('Köln Hbf', node)
print('took', time.time() - start)

print('distance_path_then_sum ', end='')
start = time.time()
for node in some_nodes:
    distance_path_then_sum('Köln Hbf', node)
print('took', time.time() - start)

Which results in

distance_shortest_path took 46.34037733078003
distance_nx took 12.006148099899292
distance_path_then_sum took 0.9555535316467285

You can use the shortest_paths function for this in igraph . Using is quite straightforward, suppose that G is your graph, with G.es['weight'] edge weights, then

D = G.shortest_paths(weights='weight'))

will give you an igraph matrix D . You can convert this to a numpy array as

D = np.array(list(D))

To obtain the distance between only a specific pair of (sets of) nodes, you can specify the source and target arguments of shortest_paths .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM