简体   繁体   English

DFS 查找所有可能的路径很慢

[英]DFS to find all possible path is very slow

I wrote DFS-like algorithm to find all possible paths starting from zero level.我编写了类似 DFS 的算法来查找从零级别开始的所有可能路径。 With 2,000 nodes and 5,000 edges, below code execution is extremely slow.有 2,000 个节点和 5,000 个边,下面的代码执行速度非常慢。 Any suggestion for this algorithm?对这个算法有什么建议吗?

    all_path = []

    def printAllPathsUntil(s, path):
        path.append(s)
        if s not in adj or len(adj[s]) <= 0:
            all_path.append(path[:]) # EDIT2
        else:
            for i in adj[s]:
                printAllPathsUntil(i, path)
        path.pop()

    for point in points_in_start:
        path = []
        printAllPathsUntil(point, path)

And the adj holds edges;并且adj保持边缘; start position as key and destination list as value.开始位置作为键,目标列表作为值。

    points_in_start = [0, 3, 7]
    adj = {0: [1, 8],
           1: [2, 5],
           2: [],
           3: [2, 4],
           4: [],
           5: [6],
           6: [],
           7: [6],
           8: [2]
           }

EDIT1编辑1

  • This is a DAG.这是一个 DAG。 No cycles.没有循环。

在此处输入图片说明

The problem with your algorithm is that it will do a lot of repeated work.你的算法的问题在于它会做很多重复的工作。 This is not the case in your example, as the only time when a node is reached by two other nodes, it is a leaf node, like C , but imaging an edge from D to B : That would mean that the entire sub-graph starting at B is visited again!在您的示例中情况并非如此,因为当一个节点被其他两个节点到达时,它是一个叶节点,如C ,但将边缘从D成像到B :这意味着整个子图从B开始又被访问了! For a graph with 2000 nodes, this will result in a significant slow-down.对于具有 2000 个节点的图,这将导致显着的减速。

To counter this, you can use memoization, but this means that you have to restructure your algorithm to instead of adding to the existing path and then adding that path to all_paths , it has to return the (partial) paths starting at the current node and combine those to the full paths with the parent node.为了解决这个问题,您可以使用记忆化,但这意味着您必须重新all_paths您的算法,而不是添加到现有path然后将该path添加到all_paths ,它必须return从当前节点开始的(部分)路径和将它们与父节点的完整路径结合起来。 You can then use functools.lru_cache to re-use all those partial results when you visit B again coming from another node.然后,当您从另一个节点再次访问B时,您可以使用functools.lru_cache重新使用所有这些部分结果。

from functools import lru_cache

@lru_cache(None)
def getAllPathsUntil(s):
    if s not in adj or not adj[s]:
        return [ [s] ]
    else:
        return [ [s, *p] for a in adj[s]
                         for p in getAllPathsUntil(a)]

all_paths = []
for point in points_in_start:
    all_paths.extend(getAllPathsUntil(point))

As pointed out already in the comments and other answers, remembering downstream paths of previously visited nodes is an area of optimization.正如评论和其他答案中已经指出的那样,记住以前访问过的节点的下游路径是一个优化领域。

Here's my attempt at implementing that.这是我尝试实现的。

Here, downstream_paths is a dictionary where we remember, for each visited non-leaf node, the downstream paths from that node.这里, downstream_paths是一个字典,我们在其中记住,对于每个访问过的非叶节点,来自该节点的下游路径。

I have mentioned %%timeit results towards the end for a small test case containing a small case of "revisited non-leafs".我已经提到%%timeit结果是一个包含“重新访问的非叶子”的小案例的小测试案例。 Since my test case had only one case of a non-leaf node being re-visited, the improvement was only modest.由于我的测试用例只有一个非叶节点被重新访问的情况,因此改进很小。 Perhaps in your large-scale dataset, there will be a wider gap in the performance.也许在你的大规模数据集中,性能上会有更大的差距。

Input data:输入数据:

points_in_start = [0, 3, 7]
adj = {0: [1, 8],
       1: [2, 5],
       2: [],
       3: [2, 4],
       4: [],
       5: [6],
       6: [],
       7: [6],
       8: [2],     # Non-leaf node "2" is a child of both "8" and "3"
       
       2:[10],
       
       10:[11,18],
       11:[12,15],
       12:[],
       15:[16],
       16:[],
       18:[12]
      }

The modified code:修改后的代码:

%%timeit

downstream_paths = {}                                 # Maps each node to its
                                                      # list of downstream paths
                                                      # starting with that node.

def getPathsToLeafsFrom(s):      # Returns list of downstream paths starting from s
                                 # and ending in some leaf node.
    children = adj.get(s, [])
    if not children:                                  # s is a Leaf
        paths_from_s = [[s]]
    else:                                             # s is a Non-leaf
        ds_paths = downstream_paths.get(s, [])        # Check if s was previously visited
        if ds_paths:                                  # If s was previously visited.
            paths_from_s = ds_paths
        else:                                         # s was not visited earlier.
            paths_from_s = []                         # Initialize
            for child in children:
                paths_from_child = getPathsToLeafsFrom(child)   # Recurse for each child
                for p in paths_from_child:
                    paths_from_s.append([s] + p)
            downstream_paths[s] = paths_from_s       # Cache this, to use when s is re-visited
    return paths_from_s

path = []
for point in points_in_start:
    path.extend(getPathsToLeafsFrom(point))

Output:输出:

from pprint import pprint
pprint (all_path)

[[0, 1, 2, 10, 11, 12],
 [0, 1, 2, 10, 11, 15, 16],
 [0, 1, 2, 10, 18, 12],
 [0, 1, 5, 6],
 [0, 8, 2, 10, 11, 12],
 [0, 8, 2, 10, 11, 15, 16],
 [0, 8, 2, 10, 18, 12],
 [3, 2, 10, 11, 12],
 [3, 2, 10, 11, 15, 16],
 [3, 2, 10, 18, 12],
 [3, 4],
 [7, 6]]

Timing results: Original posted code:计时结果: 原始发布代码:

10000 loops, best of 3: 63 µs per loop 10000 个循环,最好的 3 个:每个循环 63 µs

Timing results: Optimized code:计时结果: 优化代码:

10000 loops, best of 3: 43.2 µs per loop 10000 个循环,最好的 3 个:每个循环 43.2 µs

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 试图在Python中使用DFS递归查找图中的所有路径 - trying to find all the path in a graph using DFS recursive in Python 除了DFS中的(非常慢的)深度复制,还有其他选择吗? - Any alternative to a (very slow) deepcopy in a DFS? 我发现使用Pygrib访问grb2文件非常缓慢且耗时,对所有人来说都一样吗? - I find accessing grb2 files with Pygrib very slow and time consuming, is it same for all? 尝试在图中使用 DFS 找到最长的诱导路径 - Trying to find the longest induced path using DFS in a graph 使用 BFS/DFS 寻找有向无环图中权重最大的路径 - Using BFS/DFS To Find Path With Maximum Weight in Directed Acyclic Graph 在文本中查找单词,很慢的解决方案 - Find words in text, very slow solution 从给定的开始获取所有可能的 DFS 访问 - Get all possible DFS visits from a given start 当存在循环时,是否可以使用 DFS 遍历图中的所有连接节点? - Is it possible to traverse all connected nodes in a graph with DFS when cycles are present? Jupyter 笔记本根本不工作或非常慢 - Jupyter notebook not working at all or very slow networkx库的dfs_edges函数找不到所有的边 - the function dfs_edges of networkx library can not find all the edges
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM