简体   繁体   English

Dijkstra在python中的算法实现跟踪

[英]Dijkstra's algorithm implementation tracing in python

I'm trying to trace a python implementation of Dijkstra's algorithm using priority queue but I couldn't follow because I'm new to python我正在尝试使用优先级队列跟踪 Dijkstra 算法的 python 实现,但我无法遵循,因为我是 python 的新手

here's the implementation这是实现

def dijkstra(edges, f, t):
    g = defaultdict(set)
    for l,r,c in edges:
        g[l].add((c,r))
        g[r].add((c, l))

    q, seen,  = [(0,f,())], set(),
    while q:
        (weight, v1, path) = heappop(q)
        if v1 not in seen:
            seen.add(v1)
            path += (v1,)
            if v1 == t:
                return weight, path
            for k, v2 in g.get(v1, ()):
                if v2 not in seen:
                    heappush(q, (weight+ k, v2, path))


    return float("inf")
  • first why did it use g = defaultdict(set) instead of g = defaultdict(list) and used.add() instead of.append()首先为什么它使用g = defaultdict(set)而不是g = defaultdict(list)和 used.add() 而不是.append()
  • I understand that in the beginning of Dijkstra algorithm you need to to set all weights for all nodes to infinity but I don't see it here.我知道在 Dijkstra 算法开始时,您需要将所有节点的所有权重设置为无穷大,但我在这里看不到。
  • also in which lines the node decides the path it's going through like in what line the decision of going left or right is made.节点在哪条线上决定它要经过的路径,就像在哪条线上做出向左或向右的决定一样。 in simple word where in the code the weighted line between the nodes is made.简而言之,在代码中节点之间的加权线是在哪里制作的。

a comment explaining what happened on each line of the code would be really helpful for me to understand it.解释代码的每一行发生了什么的注释对我理解它真的很有帮助。

As to your questions:至于你的问题:

first why did it use g = defaultdict(set) instead of g = defaultdict(list) and used .add() instead of .append()首先为什么它使用g = defaultdict(set)而不是g = defaultdict(list)并使用.add()而不是.append()

It would work just as well with list .它与list一样有效。 Of course, the method to be used ( add or append ) follows from this choice.当然,要使用的方法( addappend )遵循此选择。 The only advantage I can see is that with set you'll avoid adding the same edge twice.我能看到的唯一优点是使用set可以避免两次添加相同的边缘。 In general, a graph can have multiple edges between the same two vertices, and they could even have the same weight: when this occurs there is no reason to consider these duplicate edges separately, and the set will make sure the duplicate edges are ignored.一般来说,一个图在相同的两个顶点之间可以有多个边,它们甚至可以具有相同的权重:当这种情况发生时,没有理由单独考虑这些重复的边,并且该set将确保忽略重复的边。

I understand that in the beginning of Dijkstra algorithm you need to to set all weights for all nodes to infinity but I don't see it here.我知道在 Dijkstra 算法开始时,您需要将所有节点的所有权重设置为无穷大,但我在这里看不到。

There are different ways to implement the algorithm.有不同的方法来实现算法。 Indeed, you could add all vertices to the priority queue at the very start, where all of them except the source vertex start out with an infinity weight.实际上,您可以在一开始就将所有顶点添加到优先级队列中,其中除了源顶点之外的所有顶点都以无穷大的权重开始。 But it is a bit more efficient to just exclude those "infinity" vertices from the queue: this way the queue size is smaller and the first vertices that are added to the queue will be added slightly faster.但是从队列中排除那些“无穷大”顶点会更有效一些:这样队列大小会更小,并且添加到队列中的第一个顶点会稍微快一些。 So any vertex that is not on the queue is in fact a vertex that has still a weight of infinity.因此,任何不在队列中的顶点实际上都是一个仍然具有无穷大权重的顶点。

also in which lines the node decides the path it's going through like in what line the decision of going left or right is made.节点在哪条线上决定它要经过的路径,就像在哪条线上做出向左或向右的决定一样。 in simple word where in the code the weighted line between the nodes is made.简而言之,在代码中节点之间的加权线是在哪里制作的。

There is no decision visible in the code.代码中没有可见的决定。 All paths are potential winners until the moment the target node is found.在找到目标节点之前,所有路径都是潜在的赢家。 Before that happens all partial paths that have been constructed are on the heap, and it is the characteristic of the heap that determines which path will be the next one that will be extended to neighboring nodes.在此之前,所有已构建的部分路径都在堆上,而堆的特性决定了哪条路径将是下一条将扩展到相邻节点的路径。 And then those longer paths (with more vertices) will be thrown in the heap again, where the magic of the heap will do its work again.然后那些较长的路径(具有更多顶点)将再次被扔进堆中,堆的魔力将再次发挥作用。 So if you look for "decision", there is only a decision made inside the heap: it tells us which is the path with the least weight that is present in the heap.因此,如果您寻找“决定”,则只有在堆内做出决定:它告诉我们哪条路径是堆中存在的权重最小的路径。 And so the main loop may work a bit on one path (to extend it), but then in the next iteration it may work on an entirely different path.因此主循环可能在一条路径上工作一点(以扩展它),但在下一次迭代中它可能在完全不同的路径上工作。 And so it continues until suddenly it finds that it has reached the target vertex.就这样继续下去,直到突然发现它已经到达了目标顶点。 At that moment only, all other paths that were still candidates on the heap, are ignored.仅在那一刻,仍然是堆上的候选路径的所有其他路径都将被忽略。

If you want to know a bit more about this hidden magic of heappop and heappush , read the Wikipedia article on the subject.如果您想更多地了解heappopheappush的隐藏魔法,请阅读有关该主题的Wikipedia 文章

Not optimal不是最优的

Although the algorithm is correct, it is not an efficient implementation.虽然算法是正确的,但它不是一个有效的实现。 The following statement cases a path to be copied, and that path might have up to n elements, so it has a worst-case time complexity of O(n) on one execution, giving the algorithm a worst-case time complexity of O(n²logn) :以下语句为要复制的路径提供案例,并且该路径可能有多达n 个元素,因此它在一次执行时具有O(n)的最坏情况时间复杂度,从而使算法的最坏情况时间复杂度为O( n²logn) :

path += (v1,)

To avoid this, it is common to not keep track of the paths as a whole, but to only store a backreference to the previous node where we came form.为了避免这种情况,通常不跟踪整个路径,而只存储对我们形成的前一个节点的反向引用。 Then when the time comes that we hit the target node, we can walk back following these backreferences and build the path only once.然后当我们到达目标节点的时候,我们可以沿着这些反向引用走回去并且只构建一次路径。 As storing a backreference takes constant time, this improvement will give the algorithm a time complexity of O(nlogn) .由于存储反向引用需要固定时间,因此这种改进将使算法的时间复杂度为O(nlogn)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM