简体   繁体   中英

Weighted quick-union with path compression algorithm: time complexity analysis

I'm learning "Weighted quick-union with path compression" algorithm for an union/find structure. The Princeton edu site explained detailedly the algorithm. And here is the implementation in Java:

public class WQUPC {
  private int[] id;
  private int[] sz;
  public WQUPC(int N) {
    id = new int[N];
    sz = new int[N];
    for (int i = 0; i < N; i++) {
      id[i] = i;
      sz[i] = 1;
    }
  }

  int root(int i) {
    while (i != id[i]) {
      id[i] = id[id[i]];
      i = id[i];
    }
    return i;
  }

  boolean connected(int p, int q) { return root(p) == root(q); }

  void union(int p, int q) {
    int i = root(p);
    int j = root(q);
    if (sz[i] < sz[j]) {
      id[i] = j;
      sz[j] += sz[i];
    } else {
      id[j] = i;
      sz[i] += sz[j];
    }
  }
}

But just like the website mentions about its performance:

Theorem: Starting from an empty data structure, any sequence of M union and find operations on N objects takes O(N + M lg* N) time.

• Proof is very difficult.

• But the algorithm is still simple!

But I'm still curious about how comes the iterative logarithm lg*n. How is it derived? Can someone prove it or just explain it in an intuitive way?

To begin with, your question has a slight mistake: the complexity with just path compression is only O(m log(n)) (without the iterated log). See, for example, exercise 21-4.4 in Introduction To Algorithms . In fact, you block

    if (sz[i] < sz[j]) {
      id[i] = j;
      sz[j] += sz[i];
    } else {
      id[j] = i;
      sz[i] += sz[j];
    }

does union by rank.

With both union by rank and path compression, though, the expression you used can be proved easily (much more easily than the inverse Ackerman one). The proof is based on three points:

  1. On each leaf-root path, the rank of each node is increasing. This in fact relies on union by rank, BTW, and, with it, it is very easy to show.

  2. If the root of a tree has rank r , the tree has at least 2 r nodes. This can be shown by induction.

Based on 2., it's possible to show

  1. The maximal number of nodes with rank r is at most n / 2 r .

The rest of the proof now follows by a clever arrangement of the worst possible way the ranks could be organized, which still shows that not too many are bad. For further details, see the Wikipedia entry on this .

As I recall, the proof had to do with amortizing the cost of the path compression over a set of searches. Shallow searches are cheap, and don't incur the cost of much path compression; deep searches cost a lot for that search and the path compression, but the path compression makes subsequent searches much cheaper on average.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM