[英]DFS to find all possible path is very slow
我編寫了類似 DFS 的算法來查找從零級別開始的所有可能路徑。 有 2,000 個節點和 5,000 個邊,下面的代碼執行速度非常慢。 對這個算法有什么建議嗎?
all_path = []
def printAllPathsUntil(s, path):
path.append(s)
if s not in adj or len(adj[s]) <= 0:
all_path.append(path[:]) # EDIT2
else:
for i in adj[s]:
printAllPathsUntil(i, path)
path.pop()
for point in points_in_start:
path = []
printAllPathsUntil(point, path)
並且adj
保持邊緣; 開始位置作為鍵,目標列表作為值。
points_in_start = [0, 3, 7]
adj = {0: [1, 8],
1: [2, 5],
2: [],
3: [2, 4],
4: [],
5: [6],
6: [],
7: [6],
8: [2]
}
你的算法的問題在於它會做很多重復的工作。 在您的示例中情況並非如此,因為當一個節點被其他兩個節點到達時,它是一個葉節點,如C
,但將邊緣從D
成像到B
:這意味着整個子圖從B
開始又被訪問了! 對於具有 2000 個節點的圖,這將導致顯着的減速。
為了解決這個問題,您可以使用記憶化,但這意味着您必須重新all_paths
您的算法,而不是添加到現有path
然后將該path
添加到all_paths
,它必須return
從當前節點開始的(部分)路徑和將它們與父節點的完整路徑結合起來。 然后,當您從另一個節點再次訪問B
時,您可以使用functools.lru_cache
重新使用所有這些部分結果。
from functools import lru_cache
@lru_cache(None)
def getAllPathsUntil(s):
if s not in adj or not adj[s]:
return [ [s] ]
else:
return [ [s, *p] for a in adj[s]
for p in getAllPathsUntil(a)]
all_paths = []
for point in points_in_start:
all_paths.extend(getAllPathsUntil(point))
正如評論和其他答案中已經指出的那樣,記住以前訪問過的節點的下游路徑是一個優化領域。
這是我嘗試實現的。
這里, downstream_paths
是一個字典,我們在其中記住,對於每個訪問過的非葉節點,來自該節點的下游路徑。
我已經提到%%timeit
結果是一個包含“重新訪問的非葉子”的小案例的小測試案例。 由於我的測試用例只有一個非葉節點被重新訪問的情況,因此改進很小。 也許在你的大規模數據集中,性能上會有更大的差距。
輸入數據:
points_in_start = [0, 3, 7]
adj = {0: [1, 8],
1: [2, 5],
2: [],
3: [2, 4],
4: [],
5: [6],
6: [],
7: [6],
8: [2], # Non-leaf node "2" is a child of both "8" and "3"
2:[10],
10:[11,18],
11:[12,15],
12:[],
15:[16],
16:[],
18:[12]
}
修改后的代碼:
%%timeit
downstream_paths = {} # Maps each node to its
# list of downstream paths
# starting with that node.
def getPathsToLeafsFrom(s): # Returns list of downstream paths starting from s
# and ending in some leaf node.
children = adj.get(s, [])
if not children: # s is a Leaf
paths_from_s = [[s]]
else: # s is a Non-leaf
ds_paths = downstream_paths.get(s, []) # Check if s was previously visited
if ds_paths: # If s was previously visited.
paths_from_s = ds_paths
else: # s was not visited earlier.
paths_from_s = [] # Initialize
for child in children:
paths_from_child = getPathsToLeafsFrom(child) # Recurse for each child
for p in paths_from_child:
paths_from_s.append([s] + p)
downstream_paths[s] = paths_from_s # Cache this, to use when s is re-visited
return paths_from_s
path = []
for point in points_in_start:
path.extend(getPathsToLeafsFrom(point))
輸出:
from pprint import pprint
pprint (all_path)
[[0, 1, 2, 10, 11, 12],
[0, 1, 2, 10, 11, 15, 16],
[0, 1, 2, 10, 18, 12],
[0, 1, 5, 6],
[0, 8, 2, 10, 11, 12],
[0, 8, 2, 10, 11, 15, 16],
[0, 8, 2, 10, 18, 12],
[3, 2, 10, 11, 12],
[3, 2, 10, 11, 15, 16],
[3, 2, 10, 18, 12],
[3, 4],
[7, 6]]
計時結果: 原始發布代碼:
10000 個循環,最好的 3 個:每個循環 63 µs
計時結果: 優化代碼:
10000 個循環,最好的 3 個:每個循環 43.2 µs
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.