简体   繁体   中英

Detect cycle in a graph using Kruskal's algorithm

I'm implementing Kruskal's algorithm, which is a well-known approach to finding the minimum spanning tree of a weighted graph. However, I am adapting it to find cycles in a graph. This is the pseudocode for Kruskal's algorithm:

KRUSKAL(G):
1 A = ∅
2 foreach v ∈ G.V:
3    MAKE-SET(v)
4 foreach (u, v) ordered by weight(u, v), increasing:
5    if FIND-SET(u) ≠ FIND-SET(v):
6       A = A ∪ {(u, v)}
7       UNION(u, v)
8 return A

I'm having a hard time grasping the FIND-SET() and MAKE-SET() functions, or their implementation with the disjoint-set data structure.

My current code looks like this:

class edge {
    public:      //for quick access (temp) 
      char leftV;
      char rightV;
      int weight;
};

std::vector<edge> kruskalMST(std::vector<edge> edgeList){
    std::vector<char> set;
    std::vector<edge> result;
    sortList(edgeList);    //sorts according to weight ( passed by reference)
    do{
        if(set.empty()){
            set.push_pack(edgeList[i].leftV);    //also only push them when
            set.push_pack(edgeList[i].rightV);    //they aren't there , will fix
            result.push_back(edgeList[i]);
            ++i;
        }
        else {
            if((setContains(set , edgeList[i].leftV)) && (setContains(set , edgeList[i].rightV)))
                ++i; //skip node 
            else {
                set.push_pack(edgeList[i].leftV);    //also only push them when
                set.push_pack(edgeList[i].rightV);    //they aren't there , will fix
                result.push_back(edgeList[i]);
                ++i;
            }
     } while(i<edgeList.size());
    return result;
}

My code detects a cycle in a graph when two vertices which are already present in set vector appear again. This seemed to work in most cases until I encountered a situation like this:

  [a]              [c]
   |                |
   |                |
   |                |
  [b]              [d]

When these edges appear in sorting order, this happens because a , b , c , d have already been pushed into set vector . Joining [a] to [c] doesn't produce a cycle inside the graph but is detected as a cycle due to current implementation.

Is there any viable alternative to detect cycles in my case? Or if someone could explain how MAKE-SET , FIND-SET , and UNION work in Kruskal's algorithm, that would help a lot.

MAKE-SET(v) means that you're initializing a set consisting of only the vertex v . Initially, each vertex is in a set on its own.

FIND-SET(u) is a function that tells you which set a vertex belongs to. It must return a pointer or an ID number that uniquely identifies the set.

UNION(u, v) means that you merge the set containing u with the set containing v . In other words, if u and v are in different sets, the UNION operation will form a new set containing all the members of the sets FIND-SET(u) and FIND-SET(v) .

When we implement these operations with the disjoint-set data structure , the key idea is that every set is represented by a leader. Every vertex has a pointer to some vertex in its set. The leader of the set is a vertex that points to itself. All other vertices point to a parent, and the pointers form a tree structure that has the leader as its root.

To implement FIND-SET(u) , you follow pointers starting from u until you reach the set leader, which is the only vertex in the set that points to itself.

To implement UNION(u, v) , you make the leader of one set point to the leader of the other set.

These operations can be optimized with the ideas of union by rank and path compression.

Union by rank means that you keep track of the maximum number of pointers from any vertex in a set to the leader. That is the same as the height of the tree formed by the pointers. You can update the rank by carrying out a constant number of steps for every UNION operation, which is the only time a set's rank can change. Suppose that we are merging sets A and B such that A has a larger rank than B. We make the leader of B point to the leader of A. The resulting set has the same rank as A. If A has a smaller rank than B, we make the leader of A point to the leader of B, and the resulting set has the same rank as B. If A and B have the same rank, it doesn't matter which leader we choose. Whether we make the leader of A point to the leader of B or vice versa, the resulting set will have a rank that is one greater than the rank of A.

Path compression means that when we perform the FIND operation, which entails following a sequence of pointers to the leader of the set, we make all of the vertices we encounter along the way point directly to the leader. This increases the amount of work for the current FIND operation by only a constant factor, and it reduces the amount of work for future invocations of FIND .

If you implement union by rank and path compression, you will have a blazingly fast union-find implementation. The time complexity for n operations is O(n α(n)) , where α is the inverse Ackermann function. This function grows so slowly that if n is the number of atoms in the universe, α(n) is 5. Thus, it is practically a constant, and the optimized disjoint-set data structure is practically a linear-time implementation of union-find.

I won't repeat the set-theoretic description of the union/find algorithm (Kruskal is just a special case of it), but use a simpler approach (upon which you can apply the union by rank and path compression.)

For simplicity I supposed that we have a unique integer ID for each vertex ranging from 0 to order - 1 (say, vertex ID can be used as an index to an array.)

The naive algorithm is so simple that the code speaks by itself:

int find(int v, int cc[]) {
  while (cc[v] >= 0)
    v = cc[v];
  return v;
}

bool edge_union(int v0, int v1, int cc[]) {
  int r0 = find(v0, cc);
  int r1 = find(v1, cc);
  if (r0 != r1) {
    cc[r1] = r0;
    return true;
  }
  return false;
}

The cc array is initialized with -1 everywhere (and of course its size reflects the graph order.)

Path compression can then be done by stacking encountered vertices in the while loop of the find function and then set the same representant to all of them.

int find2(int v, int cc[]) {
  std::deque<int> stack;
  while (cc[v] >= 0) {
    stack.push_back(v);
    v = cc[v];
  }
  for (auto i : stack) {
    cc[i] = v;
  }
  return v;
}

For the union by rank, we simply use the negative values of the array, the smaller the value, the greater the rank. Here is the code:

bool edge_union2(int v0, int v1, int cc[]) {
  int r0 = find(v0, cc);
  int r1 = find(v1, cc);
  if (r0 == r1)
    return false;
  if (cc[r0] < cc[r1])
    cc[r1] = r0;
  else {
    if (cc[r1] < cc[r0])
      cc[r0] = r1;
    else {
      cc[r1] = r0;
      cc[r0] -= 1;
    }
  }
  return true;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM