简体   繁体   English

为什么找到 2 个不同大小的排序数组的中位数需要 O(log(min(n,m)))

[英]Why finding median of 2 sorted arrays of different sizes takes O(log(min(n,m)))

Pleas consider this problem:请考虑这个问题:

We have 2 sorted arrays of different sizes, A[n] and B[m];我们有 2 个不同大小的排序数组,A[n] 和 B[m]; I have and implemented a classical algorithm that takes at most O(log(min(n,m))).我已经并实现了一个经典算法,它最多需要 O(log(min(n,m)))。

Here's the approach: Start partitioning the two arrays into two groups of halves (not two parts, but both partitioned should have same number of elements).方法如下:开始将两个数组划分为两组,每组一半(不是两个部分,但两个分区应该具有相同数量的元素)。 The first half contains some first elements from the first and the second arrays, and the second half contains the rest (or the last) elements form the first and the second arrays.前半部分包含来自第一个和第二个数组的一些第一个元素,后半个包含来自第一个和第二个数组的其余(或最后一个)元素。 Because the arrays can be of different sizes, it does not mean to take every half from each array.因为数组可以有不同的大小,所以并不意味着从每个数组中取出一半。 Reach a condition such that, every element in the first half is less than or equal to every element in the second half.达到一个条件,使得前半部分的每个元素都小于或等于后半部分的每个元素。

Please see the code above:请看上面的代码:

double median(std::vector<int> V1, std::vector<int> V2) 
{
    if (V1.size() > V2.size())
    {
        V1.swap(V2);
    };
    int s1 = V1.size();
    int s2 = V2.size();
    int low = 0;
    int high = s1;
    while (low <= high) 
    {
        int px = (low + high) / 2;
        int py = (s1 + s2 + 1) / 2 - px;

        int maxLeftX = (px == 0) ? MIN : V1[px - 1];
        int minRightX = (px == s1) ? MAX : V1[px];

        int maxLeftY = (py == 0) ? MIN : V2[py - 1];
        int minRightY = (py == s2) ? MAX : V2[py];

        if (maxLeftX <= minRightY && maxLeftY <= minRightX) 
        {
            if ((s1 + s2) % 2 == 0) 
            {
                return (double(std::max(maxLeftX, maxLeftY)) + double(std::min(minRightX, minRightY)))/2;
            }
            else 
            {
                return std::max(maxLeftX, maxLeftY);
            }
        }
        else if(maxLeftX > minRightY)
        {
            high = px - 1;
        }   
        else
        {
            low = px + 1;
        }
    }
    throw;
}

Although the approach is pretty straightforward and it works, I still cannot convince myself of its correctness.尽管该方法非常简单且有效,但我仍然无法说服自己其正确性。 Furthermore I cant understand why its takes O(log(min(n,m)) steps.此外我不明白为什么它需要 O(log(min(n,m)) 步骤。

If anyone can briefly explain the correcthnes and why it takes O(log(min(n,m))) steps that would be awesome.如果有人可以简要解释正确性以及为什么它需要 O(log(min(n,m))) 步骤,那将是很棒的。 Even if you can provide a link with meaningfull explanation.即使您可以提供带有有意义解释的链接。

Time complexity is quite straightforward, you binary search through the array with less elements to find such a partition, that enables you to find the median.时间复杂度非常简单,您可以在元素较少的数组中进行二分搜索以找到这样的分区,这使您能够找到中位数。 You make exactly O(log(#elements)) steps, and since your #elements is exactly min(n, m) the complexity is O(log(min(n+m)).您执行的步骤恰好为 O(log(#elements)),并且由于您的 #elements 恰好为 min(n, m),因此复杂度为 O(log(min(n+m))。

There are exactly (n + m)/2 elements smaller than the median and the same amount of elements greater.正好有 (n + m)/2 个元素小于中位数,而相同数量的元素更大。 Let's think about them as two halves (let the median belong to one of your choice).让我们将它们视为两半(让中位数属于您的选择之一)。

You can surely divide the smaller array into two subarrays, that one of them lies entirely in the first half and the second one in the other half.您当然可以将较小的数组分成两个子数组,其中一个完全位于前半部分,第二个位于另一半。 However, you have no idea how many elements are in any of them.但是,您不知道其中任何一个中有多少元素。

Let's choose some x - your guess of number of elements from the smaller array in the first half.让我们选择一些 x - 您对前半部分较小数组中元素数量的猜测。 It must be in range from 0 to n.它必须在 0 到 n 的范围内。 Then you know, since there are exactly (n + m)/2 elements smaller than the median, that you have to choose (n+m)/2 - x elements from the bigger array.然后你知道,因为正好有 (n + m)/2 个元素小于中位数,你必须从更大的数组中选择 (n+m)/2 - x 个元素。 Then you have to check if that partition actually works.然后您必须检查该分区是否确实有效。

To check if partition is good you have to check if all the elements in the smaller half are smaller than all the elements in the greater half.要检查分区是否良好,您必须检查较小一半中的所有元素是否小于较大一半中的所有元素。 You have to check if maxLeftX <= minRightY and if maxLeftY <= minRightX (then every element in the left half is smaller then every element in the right half)您必须检查是否 maxLeftX <= minRightY 和 maxLeftY <= minRightX (然后左半部分的每个元素都小于右半部分的每个元素)

If so, you've found the correct partition.如果是这样,您就找到了正确的分区。 You can now easily find your median (it's either max(maxLeftX, maxLeftY)), min(minRightX, minRightY) or their sum divided by 2).您现在可以轻松找到您的中位数(它是 max(maxLeftX, maxLeftY))、min(minRightX, minRightY) 或它们的总和除以 2)。

If not, you either took too much elements from the smaller array (the case when maxLeftX > minRightY), so next time you have to guess smaller value for x, or too little of them, then you have to guess greater value for x.如果不是,您要么从较小的数组中获取了太多元素(maxLeftX > minRightY 的情况),所以下次您必须猜测 x 的较小值,或者它们太少,则您必须猜测 x 的较大值。

To get the best complexity always guess in the middle of a range of possible values that x may take.为了获得最佳复杂度,始终猜测 x 可能采用的一系列可能值的中间值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在O(min(log(n),log(m))复杂度中找出两个不同大小的排序数组的中位数 - Find the median of two sorted arrays of different size in O(min(log(n),log(m)) complexity 具有 O(m (log n + log m)) 时间复杂度的算法,用于在 n*m 矩阵中查找第 k 个最小元素,每行排序? - Algorithm with O(m (log n + log m)) time complexity for finding kth smallest element in n*m matrix with each row sorted? 使用O(log(n + m))最差情况合并两个排序数组 - Merging Two Sorted Arrays with O(log(n+m)) Worst Case 为什么O(2 ^ n)与O(2 ^(n / 2))不同? - Why is O(2^n) different from O(2^(n/2))? 在排序列表上使用upper_bound时,进行O(N)个查找,但进行O(log(N))个比较 - O(N) lookups but O(log(N)) comparisons when using upper_bound on a sorted list 为什么排序不及时采用O(n log(n)) - Why is sorting not taking O(n log (n)) in time 寻找最小值、最大值、平均值、中值和众数的程序 - Program for finding min, max, average, median and mode 为什么不同整数大小的数组具有不同的性能? - Why do arrays of different integer sizes have different performance? O(log n)算法用于找到具有排序i的元素与预排序列表的并集 - O(log n) algorithm to find the element having rank i in union of pre-sorted lists 交换不同大小的数组 - Swap arrays of different sizes
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM