[英]Python: all simple paths in a directed graph
I am working with a (number of) directed graphs with no cycles in them, and I have the need to find all simple paths between any two nodes.我正在使用其中没有循环的(数量)有向图,并且我需要找到任何两个节点之间的所有简单路径。 In general I wouldn't worry about the execution time, but I have to do this for very many nodes during very many timesteps - I am dealing with a time-based simulation.
一般来说,我不会担心执行时间,但我必须在非常多的时间步中为非常多的节点执行此操作 - 我正在处理基于时间的模拟。
I had tried in the past the facilities offered by NetworkX but in general I found them slower than my approach.我过去曾尝试过 NetworkX 提供的工具,但总的来说我发现它们比我的方法慢。 Not sure if anything has changed lately.
不知道最近有没有什么变化。
I have implemented this recursive function:我已经实现了这个递归函数:
import timeit
def all_simple_paths(adjlist, start, end, path):
path = path + [start]
if start == end:
return [path]
paths = []
for child in adjlist[start]:
if child not in path:
child_paths = all_simple_paths(adjlist, child, end, path)
paths.extend(child_paths)
return paths
fid = open('digraph.txt', 'rt')
adjlist = eval(fid.read().strip())
number = 1000
stmnt = 'all_simple_paths(adjlist, 166, 180, [])'
setup = 'from __main__ import all_simple_paths, adjlist'
elapsed = timeit.timeit(stmnt, setup=setup, number=number)/number
print 'Elapsed: %0.2f ms'%(1000*elapsed)
On my computer, I get an average of 1.5 ms per iteration.在我的计算机上,每次迭代平均需要 1.5 毫秒。 I know this is a small number, but I have to do this operation very many times.
我知道这是一个小数目,但我不得不这样做操作很多次。
In case you're interested, I have uploaded a small file containing the adjacency list here:如果您有兴趣,我在这里上传了一个包含邻接列表的小文件:
I am using adjacency lists as inputs, coming from a NetworkX DiGraph representation.我使用邻接列表作为输入,来自 NetworkX DiGraph 表示。
Any suggestion for improvements of the algorithm (ie, does it have to be recursive?) or other approaches I may try are more than welcome.任何改进算法的建议(即它是否必须是递归的?)或我可以尝试的其他方法都非常受欢迎。
Thank you.谢谢你。
Andrea.安德烈亚。
You can save time without change the algorithm logic by caching result of shared sub-problems here.通过在此处缓存共享子问题的结果,您可以在不更改算法逻辑的情况下节省时间。
For example, calling all_simple_paths(adjlist, 'A', 'D', [])
in following graph will compute all_simple_paths(adjlist, 'D', 'E', [])
multiple times:例如,
all_simple_paths(adjlist, 'A', 'D', [])
调用all_simple_paths(adjlist, 'A', 'D', [])
将all_simple_paths(adjlist, 'D', 'E', [])
计算all_simple_paths(adjlist, 'D', 'E', [])
:
Python has a built-in decorator lru_cache
for this task. Python 有一个用于此任务的内置装饰器
lru_cache
。 It uses hash to memorize the parameters so you will need to change adjList
and path
to tuple
since list
is not hashable.它使用哈希来记住参数,因此您需要更改
adjList
和tuple
path
,因为list
不可哈希。
import timeit
import functools
@functools.lru_cache()
def all_simple_paths(adjlist, start, end, path):
path = path + (start,)
if start == end:
return [path]
paths = []
for child in adjlist[start]:
if child not in path:
child_paths = all_simple_paths(tuple(adjlist), child, end, path)
paths.extend(child_paths)
return paths
fid = open('digraph.txt', 'rt')
adjlist = eval(fid.read().strip())
# you can also change your data format in txt
adjlist = tuple(tuple(pair)for pair in adjlist)
number = 1000
stmnt = 'all_simple_paths(adjlist, 166, 180, ())'
setup = 'from __main__ import all_simple_paths, adjlist'
elapsed = timeit.timeit(stmnt, setup=setup, number=number)/number
print('Elapsed: %0.2f ms'%(1000*elapsed))
Running time on my machine:在我的机器上运行时间:
- original: 0.86ms - 原始:0.86ms
- with cache: 0.01ms - 带缓存:0.01ms
And this method should only work when there's a lot shared sub-problems.而且这种方法应该只在有很多共享的子问题时才有效。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.