简体   繁体   English

Dijkstra 算法的 Python 实现中的冗余检查

[英]Redundant Checks in Python Implementation of Dijkstra's Algorithm

EDIT: I've added some output to highlight what I believe the problem is.编辑:我添加了一些输出来突出我认为问题所在。

There are so many versions of Dijkstra's Algorithm out there, and when you are learning it is hard to assess their quality.有很多版本的 Dijkstra 算法,当您学习时,很难评估它们的质量。

The implementation below appears to be from a reputable source ( https://bradfieldcs.com/algos/graphs/dijkstras-algorithm/ )下面的实现似乎来自信誉良好的来源( https://bradfieldcs.com/algos/graphs/dijkstras-algorithm/

However, it seems that since this version doesn't keep track of visited nodes, these lines:然而,似乎由于这个版本没有跟踪访问过的节点,这些行:

for neighbor, weight in graph[current_vertex].items():
            distance = current_distance + weight

potentially create a lot of unnecessary checking.可能会造成很多不必要的检查。

The output from the code below is:下面代码的输出是:

Neighbor: V, weight: 6, distance: 6
Neighbor: W, weight: 7, distance: 7
Neighbor: U, weight: 6, distance: 12
Neighbor: X, weight: 10, distance: 16
Neighbor: U, weight: 7, distance: 14
Neighbor: X, weight: 1, distance: 8
Neighbor: W, weight: 1, distance: 9
Neighbor: V, weight: 10, distance: 18
{'U': 0, 'V': 6, 'W': 7, 'X': 8}

To me this suggests that the algorithm is doing unnecessary work, as node U becomes a neighbor multiple times, its distance is calculated as twice the distance already calculated, and therefore it is rejected.对我来说,这表明该算法正在做不必要的工作,因为节点U多次成为邻居,其距离计算为已计算距离的两倍,因此被拒绝。 My understanding is that once a node is processed, it no longer needs to be considered.我的理解是,一旦一个节点被处理,就不再需要考虑了。 I may be misunderstanding the algorithm, but this looks suspicious to me.我可能误解了算法,但这对我来说看起来很可疑。

Since keeping track of visited nodes seems integral to the definition of Dijkstra's Algorithm, is it fair to say that this particular implementation is "not great"?由于跟踪访问过的节点似乎是 Dijkstra 算法定义不可或缺的一部分,所以可以说这个特定的实现“不太好”吗? Or am I missing something?或者我错过了什么?

It would be great to see a "best practices" version of Dijkstra's Algorithm in Python, preferably using the same kind of structure for the graph.很高兴在 Python 中看到 Dijkstra 算法的“最佳实践”版本,最好对图形使用相同类型的结构。

import heapq


def calculate_distances(graph, starting_vertex):
    distances = {vertex: float('infinity') for vertex in graph}
    distances[starting_vertex] = 0

    pq = [(0, starting_vertex)]
    while len(pq) > 0:
        current_distance, current_vertex = heapq.heappop(pq)

        # Nodes can get added to the priority queue multiple times. We only
        # process a vertex the first time we remove it from the priority queue.
        if current_distance > distances[current_vertex]:
            continue

        for neighbor, weight in graph[current_vertex].items():
            distance = current_distance + weight
            print(f"Neighbor: {neighbor}, weight: {weight}, distance: {distance}")

            # Only consider this new path if it's better than any path we've
            # already found.
            if distance < distances[neighbor]:
                distances[neighbor] = distance
                heapq.heappush(pq, (distance, neighbor))

    return distances


example_graph = {
    'U': {'V': 6, 'W': 7},
    'V': {'U': 6, 'X': 10},
    'W': {'U': 7, 'X': 1},
    'X': {'W': 1, 'V': 10}
}
print(calculate_distances(example_graph, 'U'))

My understanding is that once a node is processed, it no longer needs to be considered.我的理解是,一旦一个节点被处理,就不再需要考虑了。

If you mean with "considered" that its distance along the path is calculated, then this is true, but also consider that comparing a distance with the best value so far is not significantly more complex than checking whether a neighbor was already visited.如果你的意思是“考虑”它沿路径的距离是计算出来的,那么这是真的,但也要考虑到将距离与迄今为止的最佳值进行比较并不比检查是否已经访问过邻居复杂得多。 In either case (algorithm), a truly visited node (ie a node that has been popped from the heap) will never be pushed unto the heap again.在任何一种情况下(算法),真正访问过的节点(即已从堆中弹出的节点)将永远不会再被推入堆。

Let's look at a variant of the algorithm where (only) the concept of "visited" is used to determine whether a neighbor should be put on the heap.让我们看一下算法的变体,其中(仅)使用“已访问”的概念来确定是否应将邻居放在堆上。 I intentionally have tried to limit code changes, so the differences can be highlighted better:我有意尝试限制代码更改,以便更好地突出差异:

INF = float('infinity')
def calculate_distances_2(graph, starting_vertex):
    distances = {vertex: INF for vertex in graph}
    pq = [(0, starting_vertex)]
    while len(pq) > 0:
        current_distance, current_vertex = heapq.heappop(pq)
        if distances[current_vertex] != INF:  # Already visited?
            continue
        distances[current_vertex] = current_distance
        for neighbor, weight in graph[current_vertex].items():
            print(f"Neighbor: {neighbor}, weight: {weight}, goes on heap? {distances[neighbor] == INF}")
            if distances[neighbor] == INF:  # Not yet visited?
                heapq.heappush(pq, (current_distance + weight, neighbor))
    return distances

So what is different here?那么这里有什么不同呢?

  • The distance of a node is only set when the node is popped of the heap, and this also serves for marking a node as visited: it no longer has Infinity as associated distance.节点的距离仅在节点从堆中弹出时设置,这也用于将节点标记为已访问:它不再具有 Infinity 作为关联距离。 This means that :这意味着 :

    • we don't set distances[starting_vertex] = 0 before the loop starts.我们不会在循环开始之前设置distances[starting_vertex] = 0
    • we only check whether a neighbor has been visited (implicitly, by checking distances[starting_vertex] is Infinity or not), but don't compare whether the current neighbor's distance is an improvement .我们只检查邻居是否被访问过(隐式地,通过检查distances[starting_vertex]是否为 Infinity ),但不比较当前邻居的距离是否是一个改进 This is entirely left to the heap mechanics now现在这完全留给堆机制
  • A neighbor's distance along the current path does not have to be calculated when the node was already visited.当节点已被访问时,不必计算沿当前路径的邻居的距离。

The first point practically means that the second algorithm may push a node on the heap (again), while the first algorithm might not.第一点实际上意味着第二个算法可能会(再次)将节点推送到堆上,而第一个算法可能不会。 In the worst case there is no difference, but in random cases we can expect such a difference to occur.在最坏的情况下没有差异,但在随机情况下,我们可以预期会发生这种差异。 This is because the first algorithm uses more information: when the same node is already present one or more times on the heap, the first algorithm knows the shortest distance among the traversed paths to that node, while the second algorithm "only" knows that this node has not yet been visited (ie has not yet been popped).这是因为第一个算法使用了更多的信息:当同一个节点已经在堆上出现一次或多次时,第一个算法知道到该节点的遍历路径之间的最短距离,而第二个算法“只”知道这个节点尚未访问(即尚未弹出)。

Concrete example具体例子

For your example there is no difference.对于您的示例,没有区别。 I tried with this graph:我试过这个图:

在此处输入图片说明

...and used the code below to make the comparison. ...并使用下面的代码进行比较。 Note that I changed your print call: I removed the output of distance (as in the second algorithm it is not yet calculated), and added one more information: whether the neighbor will be pushed on the heap or not (False/True):请注意,我更改了您的print调用:我删除了distance的输出(如在第二个算法中尚未计算),并添加了更多信息:是否将邻居推送到堆上(假/真):

import heapq

INF = float('infinity')

def calculate_distances(graph, starting_vertex):
    distances = {vertex: INF for vertex in graph}
    distances[starting_vertex] = 0

    pq = [(0, starting_vertex)]
    while len(pq) > 0:
        current_distance, current_vertex = heapq.heappop(pq)
        if current_distance > distances[current_vertex]:
            continue
        for neighbor, weight in graph[current_vertex].items():
            distance = current_distance + weight
            print(f"Neighbor: {neighbor}, weight: {weight}, goes on heap?: {distance < distances[neighbor]}")
            if distance < distances[neighbor]:
                distances[neighbor] = distance
                heapq.heappush(pq, (distance, neighbor))
    return distances


### Alternative

def calculate_distances_2(graph, starting_vertex):
    distances = {vertex: INF for vertex in graph}
    pq = [(0, starting_vertex)]
    while len(pq) > 0:
        current_distance, current_vertex = heapq.heappop(pq)
        if distances[current_vertex] != INF:
            continue
        distances[current_vertex] = current_distance
        for neighbor, weight in graph[current_vertex].items():
            print(f"Neighbor: {neighbor}, weight: {weight}, goes on heap? {distances[neighbor] == INF}")
            if distances[neighbor] == INF:
                heapq.heappush(pq, (current_distance + weight, neighbor))
    return distances


example_graph = {
    "0": { "1": 2, "2": 6 },
    "1": { "0": 2, "3": 5 },
    "2": { "0": 6, "3": 8 },
    "3": { "1": 5, "2": 8, "4": 10, "5": 15 },
    "4": { "3": 10, "5": 6, "6": 2 },
    "5": { "3": 15, "4": 6, "6": 6 },
    "6": { "4": 2, "5": 6 }
}

print(calculate_distances(example_graph, '0'))
print(calculate_distances_2(example_graph, '0'))

I provide here the output that is generated by the first algorithm only, and mark the lines where the second algorithm has a different output:我在这里提供仅由第一个算法生成的输出,并标记第二个算法具有不同输出的行:

Neighbor: 1, weight: 2, goes on heap?: True
Neighbor: 2, weight: 6, goes on heap?: True
Neighbor: 0, weight: 2, goes on heap?: False
Neighbor: 3, weight: 5, goes on heap?: True
Neighbor: 0, weight: 6, goes on heap?: False
Neighbor: 3, weight: 8, goes on heap?: False ****
Neighbor: 1, weight: 5, goes on heap?: False
Neighbor: 2, weight: 8, goes on heap?: False
Neighbor: 4, weight: 10, goes on heap?: True
Neighbor: 5, weight: 15, goes on heap?: True
Neighbor: 3, weight: 10, goes on heap?: False
Neighbor: 5, weight: 6, goes on heap?: False ****
Neighbor: 6, weight: 2, goes on heap?: True
Neighbor: 4, weight: 2, goes on heap?: False
Neighbor: 5, weight: 6, goes on heap?: False ****
Neighbor: 3, weight: 15, goes on heap?: False
Neighbor: 4, weight: 6, goes on heap?: False
Neighbor: 6, weight: 6, goes on heap?: False
{'0': 0, '1': 2, '2': 6, '3': 7, '4': 17, '5': 22, '6': 19}

The places where the output is different (3 places) indicate where the first algorithm outputs False and the second True .输出不同的位置(3 个位置)表示第一个算法输出False和第二个True

Conclusion结论

Attribute属性 First algorithm第一种算法 Second algorithm第二种算法
Heap size堆大小 Better更好的 Worse更差
Additions添加 Worse更差 Better更好的

The heap size will in random cases be more determining for execution times, and so the the first algorithm is expected to run slightly faster.在随机情况下,堆大小将更多地决定执行时间,因此第一个算法的运行速度预计会稍快一些。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM