Dijkstra 算法的 Python 實現中的冗余檢查

Question

編輯：我添加了一些輸出來突出我認為問題所在。

有很多版本的 Dijkstra 算法，當您學習時，很難評估它們的質量。

下面的實現似乎來自信譽良好的來源（ https://bradfieldcs.com/algos/graphs/dijkstras-algorithm/ ）

然而，似乎由於這個版本沒有跟蹤訪問過的節點，這些行：

for neighbor, weight in graph[current_vertex].items():
            distance = current_distance + weight

可能會造成很多不必要的檢查。

下面代碼的輸出是：

Neighbor: V, weight: 6, distance: 6
Neighbor: W, weight: 7, distance: 7
Neighbor: U, weight: 6, distance: 12
Neighbor: X, weight: 10, distance: 16
Neighbor: U, weight: 7, distance: 14
Neighbor: X, weight: 1, distance: 8
Neighbor: W, weight: 1, distance: 9
Neighbor: V, weight: 10, distance: 18
{'U': 0, 'V': 6, 'W': 7, 'X': 8}

對我來說，這表明該算法正在做不必要的工作，因為節點U多次成為鄰居，其距離計算為已計算距離的兩倍，因此被拒絕。 我的理解是，一旦一個節點被處理，就不再需要考慮了。 我可能誤解了算法，但這對我來說看起來很可疑。

由於跟蹤訪問過的節點似乎是 Dijkstra 算法定義不可或缺的一部分，所以可以說這個特定的實現“不太好”嗎？ 或者我錯過了什么？

很高興在 Python 中看到 Dijkstra 算法的“最佳實踐”版本，最好對圖形使用相同類型的結構。

import heapq


def calculate_distances(graph, starting_vertex):
    distances = {vertex: float('infinity') for vertex in graph}
    distances[starting_vertex] = 0

    pq = [(0, starting_vertex)]
    while len(pq) > 0:
        current_distance, current_vertex = heapq.heappop(pq)

        # Nodes can get added to the priority queue multiple times. We only
        # process a vertex the first time we remove it from the priority queue.
        if current_distance > distances[current_vertex]:
            continue

        for neighbor, weight in graph[current_vertex].items():
            distance = current_distance + weight
            print(f"Neighbor: {neighbor}, weight: {weight}, distance: {distance}")

            # Only consider this new path if it's better than any path we've
            # already found.
            if distance < distances[neighbor]:
                distances[neighbor] = distance
                heapq.heappush(pq, (distance, neighbor))

    return distances


example_graph = {
    'U': {'V': 6, 'W': 7},
    'V': {'U': 6, 'X': 10},
    'W': {'U': 7, 'X': 1},
    'X': {'W': 1, 'V': 10}
}
print(calculate_distances(example_graph, 'U'))

Answer 1

我的理解是，一旦一個節點被處理，就不再需要考慮了。

如果你的意思是“考慮”它沿路徑的距離是計算出來的，那么這是真的，但也要考慮到將距離與迄今為止的最佳值進行比較並不比檢查是否已經訪問過鄰居復雜得多。 在任何一種情況下（算法），真正訪問過的節點（即已從堆中彈出的節點）將永遠不會再被推入堆。

讓我們看一下算法的變體，其中（僅）使用“已訪問”的概念來確定是否應將鄰居放在堆上。 我有意嘗試限制代碼更改，以便更好地突出差異：

INF = float('infinity')
def calculate_distances_2(graph, starting_vertex):
    distances = {vertex: INF for vertex in graph}
    pq = [(0, starting_vertex)]
    while len(pq) > 0:
        current_distance, current_vertex = heapq.heappop(pq)
        if distances[current_vertex] != INF:  # Already visited?
            continue
        distances[current_vertex] = current_distance
        for neighbor, weight in graph[current_vertex].items():
            print(f"Neighbor: {neighbor}, weight: {weight}, goes on heap? {distances[neighbor] == INF}")
            if distances[neighbor] == INF:  # Not yet visited?
                heapq.heappush(pq, (current_distance + weight, neighbor))
    return distances

那么這里有什么不同呢？

節點的距離僅在節點從堆中彈出時設置，這也用於將節點標記為已訪問：它不再具有 Infinity 作為關聯距離。 這意味着：
- 我們不會在循環開始之前設置distances[starting_vertex] = 0 。
- 我們只檢查鄰居是否被訪問過（隱式地，通過檢查distances[starting_vertex]是否為 Infinity ），但不比較當前鄰居的距離是否是一個改進。 現在這完全留給堆機制
當節點已被訪問時，不必計算沿當前路徑的鄰居的距離。

第一點實際上意味着第二個算法可能會（再次）將節點推送到堆上，而第一個算法可能不會。 在最壞的情況下沒有差異，但在隨機情況下，我們可以預期會發生這種差異。 這是因為第一個算法使用了更多的信息：當同一個節點已經在堆上出現一次或多次時，第一個算法知道到該節點的遍歷路徑之間的最短距離，而第二個算法“只”知道這個節點尚未訪問（即尚未彈出）。

具體例子

對於您的示例，沒有區別。 我試過這個圖：

...並使用下面的代碼進行比較。 請注意，我更改了您的print調用：我刪除了distance的輸出（如在第二個算法中尚未計算），並添加了更多信息：是否將鄰居推送到堆上（假/真）：

import heapq

INF = float('infinity')

def calculate_distances(graph, starting_vertex):
    distances = {vertex: INF for vertex in graph}
    distances[starting_vertex] = 0

    pq = [(0, starting_vertex)]
    while len(pq) > 0:
        current_distance, current_vertex = heapq.heappop(pq)
        if current_distance > distances[current_vertex]:
            continue
        for neighbor, weight in graph[current_vertex].items():
            distance = current_distance + weight
            print(f"Neighbor: {neighbor}, weight: {weight}, goes on heap?: {distance < distances[neighbor]}")
            if distance < distances[neighbor]:
                distances[neighbor] = distance
                heapq.heappush(pq, (distance, neighbor))
    return distances


### Alternative

def calculate_distances_2(graph, starting_vertex):
    distances = {vertex: INF for vertex in graph}
    pq = [(0, starting_vertex)]
    while len(pq) > 0:
        current_distance, current_vertex = heapq.heappop(pq)
        if distances[current_vertex] != INF:
            continue
        distances[current_vertex] = current_distance
        for neighbor, weight in graph[current_vertex].items():
            print(f"Neighbor: {neighbor}, weight: {weight}, goes on heap? {distances[neighbor] == INF}")
            if distances[neighbor] == INF:
                heapq.heappush(pq, (current_distance + weight, neighbor))
    return distances


example_graph = {
    "0": { "1": 2, "2": 6 },
    "1": { "0": 2, "3": 5 },
    "2": { "0": 6, "3": 8 },
    "3": { "1": 5, "2": 8, "4": 10, "5": 15 },
    "4": { "3": 10, "5": 6, "6": 2 },
    "5": { "3": 15, "4": 6, "6": 6 },
    "6": { "4": 2, "5": 6 }
}

print(calculate_distances(example_graph, '0'))
print(calculate_distances_2(example_graph, '0'))

我在這里提供僅由第一個算法生成的輸出，並標記第二個算法具有不同輸出的行：

Neighbor: 1, weight: 2, goes on heap?: True
Neighbor: 2, weight: 6, goes on heap?: True
Neighbor: 0, weight: 2, goes on heap?: False
Neighbor: 3, weight: 5, goes on heap?: True
Neighbor: 0, weight: 6, goes on heap?: False
Neighbor: 3, weight: 8, goes on heap?: False ****
Neighbor: 1, weight: 5, goes on heap?: False
Neighbor: 2, weight: 8, goes on heap?: False
Neighbor: 4, weight: 10, goes on heap?: True
Neighbor: 5, weight: 15, goes on heap?: True
Neighbor: 3, weight: 10, goes on heap?: False
Neighbor: 5, weight: 6, goes on heap?: False ****
Neighbor: 6, weight: 2, goes on heap?: True
Neighbor: 4, weight: 2, goes on heap?: False
Neighbor: 5, weight: 6, goes on heap?: False ****
Neighbor: 3, weight: 15, goes on heap?: False
Neighbor: 4, weight: 6, goes on heap?: False
Neighbor: 6, weight: 6, goes on heap?: False
{'0': 0, '1': 2, '2': 6, '3': 7, '4': 17, '5': 22, '6': 19}

輸出不同的位置（3 個位置）表示第一個算法輸出False和第二個True 。

結論

屬性	第一種算法	第二種算法
堆大小	更好的	更差
添加	更差	更好的

在隨機情況下，堆大小將更多地決定執行時間，因此第一個算法的運行速度預計會稍快一些。

Dijkstra 算法的 Python 實現中的冗余檢查

問題描述

1 個解決方案

解決方案1
0 2021-10-16 08:52:26

具體例子

結論

Dijkstra 算法的 Python 實現中的冗余檢查

問題描述

1 個解決方案

解決方案1 0 2021-10-16 08:52:26

具體例子

結論

解決方案1
0 2021-10-16 08:52:26