简体   繁体   中英

Unweighted directed graph distances

Let's say I have an unweighted directed graph. I was wondering if there was a way to store all the distances between a starting node and all the remaining nodes of the graph. I know Dijkstra's algorithm could be an option, but I'm not sure this would be the best one, since I'm working with a pretty big graph (~100k nodes), and it is an unweighted one. My toughts so far were to perform a BFS, trying to store all the distances meanwhile. Is this a feasible approach?

Finally, since I'm pretty new on graph theory, could someone maybe point me in the right direction for a good Python implementation of this kind of problem?

Definitely feasible, and pretty fast if your data structure contains a list of end nodes for each starting node indexed on the starting node identifier:

Here's an example using a dictionary for edges: {startNode:list of end nodes}

from collections import deque
maxDistance = 0
def getDistances(origin,edges):
    global maxDistance
    maxDistance  = 0
    distances = {origin:0}         # {endNode:distance from origin}
    toLink    = deque([origin])    # start at origin (distance=0)
    while toLink:
        start = toLink.popleft()     # previous end, will chain to next
        dist  = distances[start] + 1 # new next are at +1
        for end in edges[start]:                # next end nodes 
            if end in distances: continue       # new ones only
            distances[end] = dist               # record distance
            toLink.append(end)                  # will link from there
            maxDistance = max(maxDistance,dist)      
            
    return distances

This does one iteration per node (excluding unreachable nodes) and uses fast dictionary access to follow links to new next nodes

Using some random test data (10 million edges)...

import random
from collections import defaultdict

print("loading simulated graphs")
vertexCount = 100000
edgeCount   = vertexCount * 100
edges       = defaultdict(set)
edgesLoaded = 0
minSpan     = 1 # vertexCount//2
while edgesLoaded<edgeCount:
    start = random.randrange(vertexCount)
    end   = random.randrange(vertexCount)
    if abs(start-end) > minSpan and end not in edges[start]:
        edges[start].add(end)
        edgesLoaded += 1
print("loaded!")

Performance:

# starting from a randomly selected node
origin    = random.choice(list(edges.keys())) 

from timeit import timeit
t = timeit(lambda:getDistances(origin,edges),number=1)

print(f"{t:.2f} seconds for",edgeCount,"edges", "max distance = ",maxDistance)

# 3.06 seconds for 10000000 edges max distance =  4        

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM