简体   繁体   中英

Dijkstra's algorithm implementation tracing in python

I'm trying to trace a python implementation of Dijkstra's algorithm using priority queue but I couldn't follow because I'm new to python

here's the implementation

def dijkstra(edges, f, t):
    g = defaultdict(set)
    for l,r,c in edges:
        g[l].add((c,r))
        g[r].add((c, l))

    q, seen,  = [(0,f,())], set(),
    while q:
        (weight, v1, path) = heappop(q)
        if v1 not in seen:
            seen.add(v1)
            path += (v1,)
            if v1 == t:
                return weight, path
            for k, v2 in g.get(v1, ()):
                if v2 not in seen:
                    heappush(q, (weight+ k, v2, path))


    return float("inf")
  • first why did it use g = defaultdict(set) instead of g = defaultdict(list) and used.add() instead of.append()
  • I understand that in the beginning of Dijkstra algorithm you need to to set all weights for all nodes to infinity but I don't see it here.
  • also in which lines the node decides the path it's going through like in what line the decision of going left or right is made. in simple word where in the code the weighted line between the nodes is made.

a comment explaining what happened on each line of the code would be really helpful for me to understand it.

As to your questions:

first why did it use g = defaultdict(set) instead of g = defaultdict(list) and used .add() instead of .append()

It would work just as well with list . Of course, the method to be used ( add or append ) follows from this choice. The only advantage I can see is that with set you'll avoid adding the same edge twice. In general, a graph can have multiple edges between the same two vertices, and they could even have the same weight: when this occurs there is no reason to consider these duplicate edges separately, and the set will make sure the duplicate edges are ignored.

I understand that in the beginning of Dijkstra algorithm you need to to set all weights for all nodes to infinity but I don't see it here.

There are different ways to implement the algorithm. Indeed, you could add all vertices to the priority queue at the very start, where all of them except the source vertex start out with an infinity weight. But it is a bit more efficient to just exclude those "infinity" vertices from the queue: this way the queue size is smaller and the first vertices that are added to the queue will be added slightly faster. So any vertex that is not on the queue is in fact a vertex that has still a weight of infinity.

also in which lines the node decides the path it's going through like in what line the decision of going left or right is made. in simple word where in the code the weighted line between the nodes is made.

There is no decision visible in the code. All paths are potential winners until the moment the target node is found. Before that happens all partial paths that have been constructed are on the heap, and it is the characteristic of the heap that determines which path will be the next one that will be extended to neighboring nodes. And then those longer paths (with more vertices) will be thrown in the heap again, where the magic of the heap will do its work again. So if you look for "decision", there is only a decision made inside the heap: it tells us which is the path with the least weight that is present in the heap. And so the main loop may work a bit on one path (to extend it), but then in the next iteration it may work on an entirely different path. And so it continues until suddenly it finds that it has reached the target vertex. At that moment only, all other paths that were still candidates on the heap, are ignored.

If you want to know a bit more about this hidden magic of heappop and heappush , read the Wikipedia article on the subject.

Not optimal

Although the algorithm is correct, it is not an efficient implementation. The following statement cases a path to be copied, and that path might have up to n elements, so it has a worst-case time complexity of O(n) on one execution, giving the algorithm a worst-case time complexity of O(n²logn) :

path += (v1,)

To avoid this, it is common to not keep track of the paths as a whole, but to only store a backreference to the previous node where we came form. Then when the time comes that we hit the target node, we can walk back following these backreferences and build the path only once. As storing a backreference takes constant time, this improvement will give the algorithm a time complexity of O(nlogn) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM