简体   繁体   English

Python:有向图中的所有简单路径

[英]Python: all simple paths in a directed graph

I am working with a (number of) directed graphs with no cycles in them, and I have the need to find all simple paths between any two nodes.我正在使用其中没有循环的(数量)有向图,并且我需要找到任何两个节点之间的所有简单路径。 In general I wouldn't worry about the execution time, but I have to do this for very many nodes during very many timesteps - I am dealing with a time-based simulation.一般来说,我不会担心执行时间,但我必须在非常多的时间步中为非常多的节点执行此操作 - 我正在处理基于时间的模拟。

I had tried in the past the facilities offered by NetworkX but in general I found them slower than my approach.我过去曾尝试过 NetworkX 提供的工具,但总的来说我发现它们比我的方法慢。 Not sure if anything has changed lately.不知道最近有没有什么变化。

I have implemented this recursive function:我已经实现了这个递归函数:

import timeit

def all_simple_paths(adjlist, start, end, path):

    path = path + [start]

    if start == end:
        return [path]

    paths = []

    for child in adjlist[start]:

        if child not in path:

            child_paths = all_simple_paths(adjlist, child, end, path)
            paths.extend(child_paths)

    return paths


fid = open('digraph.txt', 'rt')
adjlist = eval(fid.read().strip())

number = 1000
stmnt  = 'all_simple_paths(adjlist, 166, 180, [])'
setup  = 'from __main__ import all_simple_paths, adjlist'
elapsed = timeit.timeit(stmnt, setup=setup, number=number)/number
print 'Elapsed: %0.2f ms'%(1000*elapsed)

On my computer, I get an average of 1.5 ms per iteration.在我的计算机上,每次迭代平均需要 1.5 毫秒。 I know this is a small number, but I have to do this operation very many times.我知道这是一个小数目,但我不得不这样做操作很多次。

In case you're interested, I have uploaded a small file containing the adjacency list here:如果您有兴趣,我在这里上传了一个包含邻接列表的小文件:

adjlist调整列表

I am using adjacency lists as inputs, coming from a NetworkX DiGraph representation.我使用邻接列表作为输入,来自 NetworkX DiGraph 表示。

Any suggestion for improvements of the algorithm (ie, does it have to be recursive?) or other approaches I may try are more than welcome.任何改进算法的建议(即它是否必须是递归的?)或我可以尝试的其他方法都非常受欢迎。

Thank you.谢谢你。

Andrea.安德烈亚。

You can save time without change the algorithm logic by caching result of shared sub-problems here.通过在此处缓存共享子问题的结果,您可以在不更改算法逻辑的情况下节省时间。

For example, calling all_simple_paths(adjlist, 'A', 'D', []) in following graph will compute all_simple_paths(adjlist, 'D', 'E', []) multiple times:例如, all_simple_paths(adjlist, 'A', 'D', [])调用all_simple_paths(adjlist, 'A', 'D', [])all_simple_paths(adjlist, 'D', 'E', [])计算all_simple_paths(adjlist, 'D', 'E', []) 在此处输入图片说明

Python has a built-in decorator lru_cache for this task. Python 有一个用于此任务的内置装饰器lru_cache It uses hash to memorize the parameters so you will need to change adjList and path to tuple since list is not hashable.它使用哈希来记住参数,因此您需要更改adjListtuple path ,因为list不可哈希。

import timeit
import functools

@functools.lru_cache()
def all_simple_paths(adjlist, start, end, path):

    path = path + (start,)

    if start == end:
        return [path]

    paths = []

    for child in adjlist[start]:

        if child not in path:

            child_paths = all_simple_paths(tuple(adjlist), child, end, path)
            paths.extend(child_paths)

    return paths


fid = open('digraph.txt', 'rt')
adjlist = eval(fid.read().strip())

# you can also change your data format in txt
adjlist = tuple(tuple(pair)for pair in adjlist)

number = 1000
stmnt  = 'all_simple_paths(adjlist, 166, 180, ())'
setup  = 'from __main__ import all_simple_paths, adjlist'
elapsed = timeit.timeit(stmnt, setup=setup, number=number)/number
print('Elapsed: %0.2f ms'%(1000*elapsed))

Running time on my machine:在我的机器上运行时间:
- original: 0.86ms - 原始:0.86ms
- with cache: 0.01ms - 带缓存:0.01ms

And this method should only work when there's a lot shared sub-problems.而且这种方法应该只在有很多共享的子问题时才有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM