简体   繁体   English

了解两个排序数组中位数的算法

[英]Understanding the algorithm of Median of Two Sorted Arrays

There are two sorted arrays A and B of size m and n respectively. 分别有大小为m和n的两个排序数组A和B。 Find the median of the two sorted arrays. 找到两个排序数组的中位数。 The overall run time complexity should be O(log (m+n)). 总体运行时复杂度应为O(log(m + n))。

I don't understand the formulas for calculating aMid, and bMid. 我不了解计算aMid和bMid的公式。 What's the logic behind these formulas? 这些公式背后的逻辑是什么?

int aMid = aLen * k / (aLen + bLen); int aMid = aLen * k /(aLen + bLen); // a's middle count //一个中间计数

int bMid = k - aMid - 1; int bMid = k-aMid-1; // b's middle count // b的中间计数

Here is the link to program. 这是程序链接。 http://www.programcreek.com/2012/12/leetcode-median-of-two-sorted-arrays-java/][1] http://www.programcreek.com/2012/12/leetcode-median-of-two-sorted-arrays-java/][1]

public static double findMedianSortedArrays(int A[], int B[]) {
    int m = A.length;
    int n = B.length;

    if ((m + n) % 2 != 0) // odd
        return (double) findKth(A, B, (m + n) / 2, 0, m - 1, 0, n - 1);
    else { // even
        return (findKth(A, B, (m + n) / 2, 0, m - 1, 0, n - 1) 
            + findKth(A, B, (m + n) / 2 - 1, 0, m - 1, 0, n - 1)) * 0.5;
    }
}

public static int findKth(int A[], int B[], int k, 
    int aStart, int aEnd, int bStart, int bEnd) {

    int aLen = aEnd - aStart + 1;
    int bLen = bEnd - bStart + 1;

    // Handle special cases
    if (aLen == 0)
        return B[bStart + k];
    if (bLen == 0)
        return A[aStart + k];
    if (k == 0)
        return A[aStart] < B[bStart] ? A[aStart] : B[bStart];

    int aMid = aLen * k / (aLen + bLen); // a's middle count    
                                      // I AM STUCK HERE

    int bMid = k - aMid - 1; // b's middle count

    // make aMid and bMid to be array index
    aMid = aMid + aStart;
    bMid = bMid + bStart;

    if (A[aMid] > B[bMid]) {
        k = k - (bMid - bStart + 1);
        aEnd = aMid;
        bStart = bMid + 1;
    } else {
        k = k - (aMid - aStart + 1);
        bEnd = bMid;
        aStart = aMid + 1;
    }

    return findKth(A, B, k, aStart, aEnd, bStart, bEnd);
}

I got some idea, from the comments with the code, how these formulas are calculated but still don't understand to explain to someone "why these formulas" Or what's the logic behind these formulas? 我从代码注释中得到了一些想法,这些公式是如何计算的,但仍然不明白向某人解释“为什么使用这些公式”,或者这些公式背后的逻辑是什么?

For int aMid = aLen * k / (aLen + bLen); 对于int aMid = aLen * k /(aLen + bLen); // a's middle count As aMid = aLen / 2 --(i) // a的中间计数为aMid = aLen / 2-(i)

and k = (aLen + bLen)/2, -->2 = (aLen + bLen)/k 并且k =(aLen + bLen)/ 2,-> 2 =(aLen + bLen)/ k

putting value of 2 in equ (i) 将2代入等式(i)

so aMid = aLen/(aLen + bLen)/k== aLen *k/ (aLen+bLen) 所以aMid = aLen /(aLen + bLen)/ k == aLen * k /(aLen + bLen)

and for int bMid = k - aMid - 1; 对于int bMid = k-aMid-1; // b's middle count // b的中间计数

aMid + bMid + 1 = k must be satisfied to be able to make the conclusions it does when A[aMid] > B[bMid] 必须得出aMid + bMid + 1 = k才能得出当A [aMid]> B [bMid]时得出的结论

As for why aMid + bMid + 1 = k is significant: If A[aMid] is greater than B[bMid], you know that any elements in after A[aMid] in A can't be the kth element since there are too many elements in B lower than it (and would exceed k elements). 至于为什么aMid + bMid + 1 = k很重要:如果A [aMid]大于B [bMid],则您知道A中A [aMid]之后的任何元素都不能成为第k个元素,因为B中的许多元素都比其低(并且将超过k个元素)。 You also know that B[bMid] and any element before B[bMid] in B can't be the kth element since there are too few elements in A lower than it (there wouldn't be enough elements before B[bMid] to be the kth element). 您还知道B中的B [bMid]以及B [bMid]之前的任何元素都不能成为第k个元素,因为A中的元素比其低(在B [bMid]之前没有足够的元素是第k个元素)。

As you already mentioned: aMid + bMid + 1 = k must be satisfied to be able to make the conclusions that: 正如您已经提到的:必须满足aMid + bMid + 1 = k才能得出以下结论:
when A[aMid] > B[bMid] we can throw away everything before bMid and everything after (including) aMid , A[aMid] > B[bMid]我们可以丢弃bMid之前的所有bMid以及bMid之后(包括)的aMid
because we know that there are bMid + aMid + 1 (from including aMid ) = k elements smaller than A[aMid] . 因为我们知道bMid + aMid + 1 (包括aMid= k元素小于A[aMid] Therefor our median lies in the remaining arrays. 因此,我们的中位数位于其余数组中。

With this in mind it does not really matter how we set up our two mid values aMid and bMid in the first place. 考虑到这一点,我们首先如何设置两个中间值aMidbMid The only thing to take care of is not letting one of them cause an IndexOutOfBoundsException . 唯一需要注意的是不要让其中一个引起IndexOutOfBoundsException

int aMid = 0;
int bMid = k - aMid - 1;
if(bMid >= bLen) {
    bMid = bLen - 1;
    aMid = k - bMid - 1;
}

Would do the trick as well. 也会做到的。 But it would take more than O(log(n+m)) time because in the worst case we only always skip one element ( A[0] ). 但这会花费O(log(n+m))时间,因为在最坏的情况下,我们总是只跳过一个元素( A[0] )。
What we want is to always throw away a percentage of aLen + bLen . 我们想要的是始终丢弃aLen + bLen
In our case this is: 在我们的例子中是:

A > B: k = k - (bMid +1) = k - (k - aMid) = aMid = k * (aLen / (aLen + bLen)) A> B:k = k-(bMid +1)= k-(k-aMid)= aMid = k *(aLen /(aLen + bLen))
B > A: k = k - (aMid + 1) = k - (k * aLen / (aLen + bLen)) -1 = k * (bLen / (aLen + bLen)) - 1 B> A:k = k-(aMid + 1)= k-(k * aLen /(aLen + bLen))-1 = k *(bLen /(aLen + bLen))-1

Ignoring the -1 and assuming that the probability for A > B is the same as B > A we get: 忽略-1并假设A > B的概率与B > A相同,我们得到:
E(k) = 0.5 * k * (aLen/(aLen + bLen)) + 0.5 * k * (bLen/(aLen + bLen))
= 0.5 * k (aLen + bLen)/(aLen + bLen) = 0.5 * k
Meaning that we get approximately O(log(n + m)) recursive calls until k is 0 and then the functions stops. 意味着我们得到大约O(log(n + m))递归调用,直到k为0,然后函数停止。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM