DFS 查找所有可能的路径很慢

Question

I wrote DFS-like algorithm to find all possible paths starting from zero level.我编写了类似 DFS 的算法来查找从零级别开始的所有可能路径。 With 2,000 nodes and 5,000 edges, below code execution is extremely slow.有 2,000 个节点和 5,000 个边，下面的代码执行速度非常慢。 Any suggestion for this algorithm?对这个算法有什么建议吗？

    all_path = []

    def printAllPathsUntil(s, path):
        path.append(s)
        if s not in adj or len(adj[s]) <= 0:
            all_path.append(path[:]) # EDIT2
        else:
            for i in adj[s]:
                printAllPathsUntil(i, path)
        path.pop()

    for point in points_in_start:
        path = []
        printAllPathsUntil(point, path)

And the adj holds edges;并且adj保持边缘； start position as key and destination list as value.开始位置作为键，目标列表作为值。

    points_in_start = [0, 3, 7]
    adj = {0: [1, 8],
           1: [2, 5],
           2: [],
           3: [2, 4],
           4: [],
           5: [6],
           6: [],
           7: [6],
           8: [2]
           }

EDIT1编辑1

This is a DAG.这是一个 DAG。 No cycles.没有循环。

在此处输入图片说明

Answer 1

The problem with your algorithm is that it will do a lot of repeated work.你的算法的问题在于它会做很多重复的工作。 This is not the case in your example, as the only time when a node is reached by two other nodes, it is a leaf node, like C , but imaging an edge from D to B : That would mean that the entire sub-graph starting at B is visited again!在您的示例中情况并非如此，因为当一个节点被其他两个节点到达时，它是一个叶节点，如C ，但将边缘从D成像到B ：这意味着整个子图从B开始又被访问了！ For a graph with 2000 nodes, this will result in a significant slow-down.对于具有 2000 个节点的图，这将导致显着的减速。

To counter this, you can use memoization, but this means that you have to restructure your algorithm to instead of adding to the existing path and then adding that path to all_paths , it has to return the (partial) paths starting at the current node and combine those to the full paths with the parent node.为了解决这个问题，您可以使用记忆化，但这意味着您必须重新all_paths您的算法，而不是添加到现有path然后将该path添加到all_paths ，它必须return从当前节点开始的（部分）路径和将它们与父节点的完整路径结合起来。 You can then use functools.lru_cache to re-use all those partial results when you visit B again coming from another node.然后，当您从另一个节点再次访问B时，您可以使用functools.lru_cache重新使用所有这些部分结果。

from functools import lru_cache

@lru_cache(None)
def getAllPathsUntil(s):
    if s not in adj or not adj[s]:
        return [ [s] ]
    else:
        return [ [s, *p] for a in adj[s]
                         for p in getAllPathsUntil(a)]

all_paths = []
for point in points_in_start:
    all_paths.extend(getAllPathsUntil(point))

Answer 2

As pointed out already in the comments and other answers, remembering downstream paths of previously visited nodes is an area of optimization.正如评论和其他答案中已经指出的那样，记住以前访问过的节点的下游路径是一个优化领域。

Here's my attempt at implementing that.这是我尝试实现的。

Here, downstream_paths is a dictionary where we remember, for each visited non-leaf node, the downstream paths from that node.这里， downstream_paths是一个字典，我们在其中记住，对于每个访问过的非叶节点，来自该节点的下游路径。

I have mentioned %%timeit results towards the end for a small test case containing a small case of "revisited non-leafs".我已经提到%%timeit结果是一个包含“重新访问的非叶子”的小案例的小测试案例。 Since my test case had only one case of a non-leaf node being re-visited, the improvement was only modest.由于我的测试用例只有一个非叶节点被重新访问的情况，因此改进很小。 Perhaps in your large-scale dataset, there will be a wider gap in the performance.也许在你的大规模数据集中，性能上会有更大的差距。

Input data:输入数据：

points_in_start = [0, 3, 7]
adj = {0: [1, 8],
       1: [2, 5],
       2: [],
       3: [2, 4],
       4: [],
       5: [6],
       6: [],
       7: [6],
       8: [2],     # Non-leaf node "2" is a child of both "8" and "3"
       
       2:[10],
       
       10:[11,18],
       11:[12,15],
       12:[],
       15:[16],
       16:[],
       18:[12]
      }

The modified code:修改后的代码：

%%timeit

downstream_paths = {}                                 # Maps each node to its
                                                      # list of downstream paths
                                                      # starting with that node.

def getPathsToLeafsFrom(s):      # Returns list of downstream paths starting from s
                                 # and ending in some leaf node.
    children = adj.get(s, [])
    if not children:                                  # s is a Leaf
        paths_from_s = [[s]]
    else:                                             # s is a Non-leaf
        ds_paths = downstream_paths.get(s, [])        # Check if s was previously visited
        if ds_paths:                                  # If s was previously visited.
            paths_from_s = ds_paths
        else:                                         # s was not visited earlier.
            paths_from_s = []                         # Initialize
            for child in children:
                paths_from_child = getPathsToLeafsFrom(child)   # Recurse for each child
                for p in paths_from_child:
                    paths_from_s.append([s] + p)
            downstream_paths[s] = paths_from_s       # Cache this, to use when s is re-visited
    return paths_from_s

path = []
for point in points_in_start:
    path.extend(getPathsToLeafsFrom(point))

Output:输出：

from pprint import pprint
pprint (all_path)

[[0, 1, 2, 10, 11, 12],
 [0, 1, 2, 10, 11, 15, 16],
 [0, 1, 2, 10, 18, 12],
 [0, 1, 5, 6],
 [0, 8, 2, 10, 11, 12],
 [0, 8, 2, 10, 11, 15, 16],
 [0, 8, 2, 10, 18, 12],
 [3, 2, 10, 11, 12],
 [3, 2, 10, 11, 15, 16],
 [3, 2, 10, 18, 12],
 [3, 4],
 [7, 6]]

Timing results: Original posted code:计时结果：原始发布代码：

10000 loops, best of 3: 63 µs per loop 10000 个循环，最好的 3 个：每个循环 63 µs

Timing results: Optimized code:计时结果：优化代码：

10000 loops, best of 3: 43.2 µs per loop 10000 个循环，最好的 3 个：每个循环 43.2 µs

DFS 查找所有可能的路径很慢

问题描述

EDIT1编辑1

2 个解决方案

解决方案1
0 已采纳 2020-11-11 11:45:37

解决方案2
0 2020-11-11 23:57:27

DFS 查找所有可能的路径很慢

问题描述

EDIT1编辑1

2 个解决方案

解决方案1 0 已采纳 2020-11-11 11:45:37

解决方案2 0 2020-11-11 23:57:27

解决方案1
0 已采纳 2020-11-11 11:45:37

解决方案2
0 2020-11-11 23:57:27