简体   繁体   中英

Why finding median of 2 sorted arrays of different sizes takes O(log(min(n,m)))

Pleas consider this problem:

We have 2 sorted arrays of different sizes, A[n] and B[m]; I have and implemented a classical algorithm that takes at most O(log(min(n,m))).

Here's the approach: Start partitioning the two arrays into two groups of halves (not two parts, but both partitioned should have same number of elements). The first half contains some first elements from the first and the second arrays, and the second half contains the rest (or the last) elements form the first and the second arrays. Because the arrays can be of different sizes, it does not mean to take every half from each array. Reach a condition such that, every element in the first half is less than or equal to every element in the second half.

Please see the code above:

double median(std::vector<int> V1, std::vector<int> V2) 
{
    if (V1.size() > V2.size())
    {
        V1.swap(V2);
    };
    int s1 = V1.size();
    int s2 = V2.size();
    int low = 0;
    int high = s1;
    while (low <= high) 
    {
        int px = (low + high) / 2;
        int py = (s1 + s2 + 1) / 2 - px;

        int maxLeftX = (px == 0) ? MIN : V1[px - 1];
        int minRightX = (px == s1) ? MAX : V1[px];

        int maxLeftY = (py == 0) ? MIN : V2[py - 1];
        int minRightY = (py == s2) ? MAX : V2[py];

        if (maxLeftX <= minRightY && maxLeftY <= minRightX) 
        {
            if ((s1 + s2) % 2 == 0) 
            {
                return (double(std::max(maxLeftX, maxLeftY)) + double(std::min(minRightX, minRightY)))/2;
            }
            else 
            {
                return std::max(maxLeftX, maxLeftY);
            }
        }
        else if(maxLeftX > minRightY)
        {
            high = px - 1;
        }   
        else
        {
            low = px + 1;
        }
    }
    throw;
}

Although the approach is pretty straightforward and it works, I still cannot convince myself of its correctness. Furthermore I cant understand why its takes O(log(min(n,m)) steps.

If anyone can briefly explain the correcthnes and why it takes O(log(min(n,m))) steps that would be awesome. Even if you can provide a link with meaningfull explanation.

Time complexity is quite straightforward, you binary search through the array with less elements to find such a partition, that enables you to find the median. You make exactly O(log(#elements)) steps, and since your #elements is exactly min(n, m) the complexity is O(log(min(n+m)).

There are exactly (n + m)/2 elements smaller than the median and the same amount of elements greater. Let's think about them as two halves (let the median belong to one of your choice).

You can surely divide the smaller array into two subarrays, that one of them lies entirely in the first half and the second one in the other half. However, you have no idea how many elements are in any of them.

Let's choose some x - your guess of number of elements from the smaller array in the first half. It must be in range from 0 to n. Then you know, since there are exactly (n + m)/2 elements smaller than the median, that you have to choose (n+m)/2 - x elements from the bigger array. Then you have to check if that partition actually works.

To check if partition is good you have to check if all the elements in the smaller half are smaller than all the elements in the greater half. You have to check if maxLeftX <= minRightY and if maxLeftY <= minRightX (then every element in the left half is smaller then every element in the right half)

If so, you've found the correct partition. You can now easily find your median (it's either max(maxLeftX, maxLeftY)), min(minRightX, minRightY) or their sum divided by 2).

If not, you either took too much elements from the smaller array (the case when maxLeftX > minRightY), so next time you have to guess smaller value for x, or too little of them, then you have to guess greater value for x.

To get the best complexity always guess in the middle of a range of possible values that x may take.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM