簡體   English   中英

DFS 查找所有可能的路徑很慢

[英]DFS to find all possible path is very slow

我編寫了類似 DFS 的算法來查找從零級別開始的所有可能路徑。 有 2,000 個節點和 5,000 個邊,下面的代碼執行速度非常慢。 對這個算法有什么建議嗎?

    all_path = []

    def printAllPathsUntil(s, path):
        path.append(s)
        if s not in adj or len(adj[s]) <= 0:
            all_path.append(path[:]) # EDIT2
        else:
            for i in adj[s]:
                printAllPathsUntil(i, path)
        path.pop()

    for point in points_in_start:
        path = []
        printAllPathsUntil(point, path)

並且adj保持邊緣; 開始位置作為鍵,目標列表作為值。

    points_in_start = [0, 3, 7]
    adj = {0: [1, 8],
           1: [2, 5],
           2: [],
           3: [2, 4],
           4: [],
           5: [6],
           6: [],
           7: [6],
           8: [2]
           }

編輯1

  • 這是一個 DAG。 沒有循環。

在此處輸入圖片說明

你的算法的問題在於它會做很多重復的工作。 在您的示例中情況並非如此,因為當一個節點被其他兩個節點到達時,它是一個葉節點,如C ,但將邊緣從D成像到B :這意味着整個子圖從B開始又被訪問了! 對於具有 2000 個節點的圖,這將導致顯着的減速。

為了解決這個問題,您可以使用記憶化,但這意味着您必須重新all_paths您的算法,而不是添加到現有path然后將該path添加到all_paths ,它必須return從當前節點開始的(部分)路徑和將它們與父節點的完整路徑結合起來。 然后,當您從另一個節點再次訪問B時,您可以使用functools.lru_cache重新使用所有這些部分結果。

from functools import lru_cache

@lru_cache(None)
def getAllPathsUntil(s):
    if s not in adj or not adj[s]:
        return [ [s] ]
    else:
        return [ [s, *p] for a in adj[s]
                         for p in getAllPathsUntil(a)]

all_paths = []
for point in points_in_start:
    all_paths.extend(getAllPathsUntil(point))

正如評論和其他答案中已經指出的那樣,記住以前訪問過的節點的下游路徑是一個優化領域。

這是我嘗試實現的。

這里, downstream_paths是一個字典,我們在其中記住,對於每個訪問過的非葉節點,來自該節點的下游路徑。

我已經提到%%timeit結果是一個包含“重新訪問的非葉子”的小案例的小測試案例。 由於我的測試用例只有一個非葉節點被重新訪問的情況,因此改進很小。 也許在你的大規模數據集中,性能上會有更大的差距。

輸入數據:

points_in_start = [0, 3, 7]
adj = {0: [1, 8],
       1: [2, 5],
       2: [],
       3: [2, 4],
       4: [],
       5: [6],
       6: [],
       7: [6],
       8: [2],     # Non-leaf node "2" is a child of both "8" and "3"
       
       2:[10],
       
       10:[11,18],
       11:[12,15],
       12:[],
       15:[16],
       16:[],
       18:[12]
      }

修改后的代碼:

%%timeit

downstream_paths = {}                                 # Maps each node to its
                                                      # list of downstream paths
                                                      # starting with that node.

def getPathsToLeafsFrom(s):      # Returns list of downstream paths starting from s
                                 # and ending in some leaf node.
    children = adj.get(s, [])
    if not children:                                  # s is a Leaf
        paths_from_s = [[s]]
    else:                                             # s is a Non-leaf
        ds_paths = downstream_paths.get(s, [])        # Check if s was previously visited
        if ds_paths:                                  # If s was previously visited.
            paths_from_s = ds_paths
        else:                                         # s was not visited earlier.
            paths_from_s = []                         # Initialize
            for child in children:
                paths_from_child = getPathsToLeafsFrom(child)   # Recurse for each child
                for p in paths_from_child:
                    paths_from_s.append([s] + p)
            downstream_paths[s] = paths_from_s       # Cache this, to use when s is re-visited
    return paths_from_s

path = []
for point in points_in_start:
    path.extend(getPathsToLeafsFrom(point))

輸出:

from pprint import pprint
pprint (all_path)

[[0, 1, 2, 10, 11, 12],
 [0, 1, 2, 10, 11, 15, 16],
 [0, 1, 2, 10, 18, 12],
 [0, 1, 5, 6],
 [0, 8, 2, 10, 11, 12],
 [0, 8, 2, 10, 11, 15, 16],
 [0, 8, 2, 10, 18, 12],
 [3, 2, 10, 11, 12],
 [3, 2, 10, 11, 15, 16],
 [3, 2, 10, 18, 12],
 [3, 4],
 [7, 6]]

計時結果: 原始發布代碼:

10000 個循環,最好的 3 個:每個循環 63 µs

計時結果: 優化代碼:

10000 個循環,最好的 3 個:每個循環 43.2 µs

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM