简体   繁体   中英

path compression is enough for disjoint-set forests , why do we need union by rank

MAKE-SET(x)
    x.p = x
    x.rank = 0

UNION(x, y)
     LINK(FIND-SET(x),FIND-SET(y))

LINK(x, y)
    if x.rank > y.rank
        y.p = x
    else 
        x.p = y
        if x.rand == y.rank
            y.rank = y.rank +1

The FIND-SET procedure with path compression is quite simple:
FIND-SET(x)
    if x != x.p
        x.p = FIND-SET(x.p)
    return x.p

You can find the pseudocode in Introduction to Algorithms 3rd at chapter 21.

this is the pseudocode for disjoint-set forests with rank and path compression. From the pseudocode, we can see that each time before union operation, we will first find the set number of each node. In the FIND-SET operation with path compression, the height of x and y will always become only 2. Because xp and yp will both point to the set's root after FIND-SET. Why union by rank still needed ?


Shihab Shahriar have solved my problem, his answer is really impressive!

Note that we apply path compression only when we perform find-set operation, and path compression can't be applied when we perform the union of two sets.

In union by rank, we take care of the fact that the root of the tree with less rank (or less depth/height) is made to point to the root of the tree with the high rank (or more depth/height). This makes sure that the tree representing the set never gets skewed.

An example where union by rank is performed:

depth=1,n=2 depth=0,n=1 depth=1,n=3 OUO = O / / \\ OOO

If we didn't perform union by rank then the tree might become something like this:

depth=1,n=2 depth=0,n=1 depth=2,n=3 OUO = O / / OO / O ie it's height is increased.

You can do an amortized analysis and calculate the time complexity of find-set when union by rank is applied then you will find that time is never going to exceed O(log2(n)) .

So, if you don't perform union by rank then the find-set operation will take O(d) time ( d represents the depth of tree) and in worst case d can become n (number of elements in the set). So for find-set operation, the time complexity would become O(n) for the first time. However, for the next-subsequent find-set operation the time may get decreased, but the point is we don't want O(n) time for any find-set operation. Thus, for the case when there are several union operations and at the end, there is a find-set operation then O(n) time would be consumed if you don't use union by rank .

We use both the union by rank and path compression together because we want to decrease the height of the tree and make it less than log2(n) (n is the number of elements in the disjoint-set), the ultimate goal is to make the height of the tree to almost one.

Yes, depth of x and y will be 2, but that doesn't mean the height of root has changed.

Note that we are never actually decrementing the rank of any root. Say a root of rank 4 has 10 leaves node ie 10 nodes are 3 edges away in this particular set. Now through path compression, you may have promoted one of these 10 to depth of 2, like x and y here. But there are still 9 left and rank of root is still 4. Now if we were to link this set with another set of rank 1, we certainly don't want our current root to be child of that new set, because it'll turn depth of other 9 nodes to 5. That's why we still need rank in addition to path compression.

Now if somehow all of these 10 leaf nodes got promoted to depth 2, we should decrement rank of root to reflect it's proper height, but the added complexity of doing that doesn't seem to be worth the trouble.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM