DFS 查找所有可能的路徑很慢

Question

我編寫了類似 DFS 的算法來查找從零級別開始的所有可能路徑。 有 2,000 個節點和 5,000 個邊，下面的代碼執行速度非常慢。 對這個算法有什么建議嗎？

    all_path = []

    def printAllPathsUntil(s, path):
        path.append(s)
        if s not in adj or len(adj[s]) <= 0:
            all_path.append(path[:]) # EDIT2
        else:
            for i in adj[s]:
                printAllPathsUntil(i, path)
        path.pop()

    for point in points_in_start:
        path = []
        printAllPathsUntil(point, path)

並且adj保持邊緣； 開始位置作為鍵，目標列表作為值。

    points_in_start = [0, 3, 7]
    adj = {0: [1, 8],
           1: [2, 5],
           2: [],
           3: [2, 4],
           4: [],
           5: [6],
           6: [],
           7: [6],
           8: [2]
           }

編輯1

這是一個 DAG。 沒有循環。

在此處輸入圖片說明

Answer 1

你的算法的問題在於它會做很多重復的工作。 在您的示例中情況並非如此，因為當一個節點被其他兩個節點到達時，它是一個葉節點，如C ，但將邊緣從D成像到B ：這意味着整個子圖從B開始又被訪問了！ 對於具有 2000 個節點的圖，這將導致顯着的減速。

為了解決這個問題，您可以使用記憶化，但這意味着您必須重新all_paths您的算法，而不是添加到現有path然后將該path添加到all_paths ，它必須return從當前節點開始的（部分）路徑和將它們與父節點的完整路徑結合起來。 然后，當您從另一個節點再次訪問B時，您可以使用functools.lru_cache重新使用所有這些部分結果。

from functools import lru_cache

@lru_cache(None)
def getAllPathsUntil(s):
    if s not in adj or not adj[s]:
        return [ [s] ]
    else:
        return [ [s, *p] for a in adj[s]
                         for p in getAllPathsUntil(a)]

all_paths = []
for point in points_in_start:
    all_paths.extend(getAllPathsUntil(point))

Answer 2

正如評論和其他答案中已經指出的那樣，記住以前訪問過的節點的下游路徑是一個優化領域。

這是我嘗試實現的。

這里， downstream_paths是一個字典，我們在其中記住，對於每個訪問過的非葉節點，來自該節點的下游路徑。

我已經提到%%timeit結果是一個包含“重新訪問的非葉子”的小案例的小測試案例。 由於我的測試用例只有一個非葉節點被重新訪問的情況，因此改進很小。 也許在你的大規模數據集中，性能上會有更大的差距。

輸入數據：

points_in_start = [0, 3, 7]
adj = {0: [1, 8],
       1: [2, 5],
       2: [],
       3: [2, 4],
       4: [],
       5: [6],
       6: [],
       7: [6],
       8: [2],     # Non-leaf node "2" is a child of both "8" and "3"
       
       2:[10],
       
       10:[11,18],
       11:[12,15],
       12:[],
       15:[16],
       16:[],
       18:[12]
      }

修改后的代碼：

%%timeit

downstream_paths = {}                                 # Maps each node to its
                                                      # list of downstream paths
                                                      # starting with that node.

def getPathsToLeafsFrom(s):      # Returns list of downstream paths starting from s
                                 # and ending in some leaf node.
    children = adj.get(s, [])
    if not children:                                  # s is a Leaf
        paths_from_s = [[s]]
    else:                                             # s is a Non-leaf
        ds_paths = downstream_paths.get(s, [])        # Check if s was previously visited
        if ds_paths:                                  # If s was previously visited.
            paths_from_s = ds_paths
        else:                                         # s was not visited earlier.
            paths_from_s = []                         # Initialize
            for child in children:
                paths_from_child = getPathsToLeafsFrom(child)   # Recurse for each child
                for p in paths_from_child:
                    paths_from_s.append([s] + p)
            downstream_paths[s] = paths_from_s       # Cache this, to use when s is re-visited
    return paths_from_s

path = []
for point in points_in_start:
    path.extend(getPathsToLeafsFrom(point))

輸出：

from pprint import pprint
pprint (all_path)

[[0, 1, 2, 10, 11, 12],
 [0, 1, 2, 10, 11, 15, 16],
 [0, 1, 2, 10, 18, 12],
 [0, 1, 5, 6],
 [0, 8, 2, 10, 11, 12],
 [0, 8, 2, 10, 11, 15, 16],
 [0, 8, 2, 10, 18, 12],
 [3, 2, 10, 11, 12],
 [3, 2, 10, 11, 15, 16],
 [3, 2, 10, 18, 12],
 [3, 4],
 [7, 6]]

計時結果：原始發布代碼：

10000 個循環，最好的 3 個：每個循環 63 µs

計時結果：優化代碼：

10000 個循環，最好的 3 個：每個循環 43.2 µs

DFS 查找所有可能的路徑很慢

問題描述

編輯1

2 個解決方案

解決方案1
0 已采納 2020-11-11 11:45:37

解決方案2
0 2020-11-11 23:57:27

DFS 查找所有可能的路徑很慢

問題描述

編輯1

2 個解決方案

解決方案1 0 已采納 2020-11-11 11:45:37

解決方案2 0 2020-11-11 23:57:27

解決方案1
0 已采納 2020-11-11 11:45:37

解決方案2
0 2020-11-11 23:57:27