简体   繁体   English

两个已排序数组的交集

[英]The intersection of two sorted arrays

Given two sorted arrays: A and B .给定两个排序数组: AB The size of array A is La and the size of array B is Lb .数组A的大小为La ,数组B的大小为Lb How to find the intersection of A and B ?如何找到AB的交集?

If La is much bigger than Lb , then will there be any difference for the intersection finding algorithm?如果LaLb大很多,那么求交点算法会有什么不同吗?

Since this looks like a HW...I'll give you the algorithm:因为这看起来像一个硬件......我会给你算法:

Let arr1,arr2 be the two sorted arrays of length La and Lb.
Let i be index into the array arr1.
Let j be index into the array arr2.
Initialize i and j to 0.

while(i < La and j < Lb) do

    if(arr1[i] == arr2[j]) { // found a common element.
        print arr[i] // print it.
        increment i // move on.
        increment j
    }
    else if(arr1[i] > arr2[j])
        increment j // don't change i, move j.
    else
        increment i // don't change j, move i.
end while

I've been struggling with same problem for a while now, so far I came with:我一直在为同样的问题苦苦挣扎,到目前为止,我得到了:

  1. Linear matching which will yield O(m+n) in worst case.在最坏情况下将产生 O(m+n) 的线性匹配。 You basically keep two pointers (A and B) each pointing to beginning of each array.您基本上保留了两个指针(A 和 B),每个指针都指向每个数组的开头。 Then advance pointer which points to smaller value, until you reach end of one of arrays, that would indicate no intersection.然后前进指向较小值的指针,直到到达数组之一的末尾,这表示没有交集。 If at any point you have *A == *B - here comes your intersection.如果在任何时候你有 *A == *B - 这就是你的交集。

  2. Binary matching.二进制匹配。 Which yields ~ O(n*log(m)) in worst case.在最坏的情况下产生 ~ O(n*log(m)) 。 You basically pick smaller array and perform binary search in bigger array of all elements of the smaller array.您基本上选择较小的数组并在较小数组的所有元素的较大数组中执行二进制搜索。 If you want to be more fancy, you can even use last position where binary search failed and use it as starting point for next binary search.如果你想更花哨,你甚至可以使用最后一个二进制搜索失败的位置,并将其用作下一个二进制搜索的起点。 This way you marginally improve worst case but for some sets it might perform miracles :)通过这种方式,您可以略微改善最坏的情况,但对于某些系列,它可能会创造奇迹:)

  3. Double binary matching.双二进制匹配。 It's a variation of regular binary matching.它是常规二进制匹配的一种变体。 Basically you get element from the middle of smaller array and do binary search in bigger array.基本上你从较小数组的中间获取元素并在较大数组中进行二分搜索。 If you find nothing then you cut smaller array in half (and yes you can toss element you already used) and cut bigger array in half (use binary search failure point).如果你什么都没有找到,那么你将较小的数组切成两半(是的,你可以扔掉你已经使用过的元素)并将较大的数组切成两半(使用二进制搜索失败点)。 And then repeat for each pair.然后对每一对重复。 Results are better than O(n*log(m)) but I am too lazy to calculate what they are.结果比 O(n*log(m)) 好,但我懒得计算它们是什么。

Those are two most basic ones.这是最基本的两个。 Both have merits.两者各有千秋。 Linear is a bit easier to implement.线性更容易实现。 Binary one is arguably faster (although there are plenty of cases when linear matching will outperform binary).二元匹配可以说更快(尽管在很多情况下线性匹配会优于二元匹配)。

If anyone knows anything better than that I would love to hear it.如果有人知道比这更好的事情,我很乐意听到。 Matching arrays is what I do these days.匹配数组是我这些天所做的。

PS don't quote me on terms "linear matching" and "binary matching" as I made them up myself and there are probably fancy name for it already. PS 不要引用我的术语“线性匹配”和“二进制匹配”,因为它们是我自己编造的,而且可能已经有很花哨的名字了。

Use set_intersection as here .这里使用set_intersection The usual implementation would work similar to the merge part of merge-sort algorithm.通常的实现类似于合并排序算法的合并部分。

void Intersect()
{
    int la, lb;
    la = 5;
    lb = 100;
    int A[5];
    int i, j, k;
    i = j = k = 0;
    for (; i < 5; ++i)
        A[i] = i + 1;
    int B[100];
    for (; j < 100; ++j)
        B[j] = j + 2;
    int newSize = la < lb ? la : lb;
    int* C = new int[newSize];
    i = j = 0;
    for (; k < lb && i < la && j < lb; ++k)
    {
        if (A[i] < B[j])
            i++;
        else if (A[i] > B[j])
            j++;
        else
        {
            C[k] = A[i];
            i++;
            j++;
        }
    }
    for (k = 0; k < newSize; ++k)
        cout << C[k] << NEWLINE;
}

This is in Java, but it does what you want.这是在 Java 中,但它可以满足您的需求。 It implements variant 3 mentioned in Nazar's answer (a "double" binary search) and should be the fastest solution.它实现了 Nazar 的回答中提到的变体 3(“双”二分搜索),应该是最快的解决方案。 I'm pretty sure this beats any kind of "galloping" approach.我很确定这胜过任何一种“疾驰”的方法。 Galloping just wastes time getting started with small steps while we jump right in with a top-down binary search.当我们直接进行自上而下的二分搜索时,飞驰只是浪费时间从小步骤开始。

It's not immediately obvious which complexity class applies here.在这里应用哪个复杂性类并不是很明显。 We do binary searches in the longer array, but never look at the same element twice, so we are definitely within O(m+n).我们在较长的数组中进行二分搜索,但从不查看同一个元素两次,所以我们肯定在 O(m+n) 之内。

This code has been thoroughly tested with random data.此代码已使用随机数据进行了彻底测试。

import java.util.Arrays;

// main function. may return null when result is empty
static int[] intersectSortedIntArrays(int[] a, int[] b) {
  return intersectSortedIntArrays(a, b, null);
}

// no (intermediate) waste version: reuse buffer
static int[] intersectSortedIntArrays(int[] a, int[] b, IntBuffer buf) {
  int i = 0, j = 0, la = lIntArray(a), lb = lIntArray(b);
  
  // swap if a is longer than b
  if (la > lb) {
    int[] temp = a; a = b; b = temp;
    int temp2 = la; la = lb; lb = temp2;
  }
  
  // special case zero elements
  if (la == 0) return null;
  
  // special case one element
  if (la == 1)
    return Arrays.binarySearch(b, a[0]) >= 0 ? a : null;
    
  if (buf == null) buf = new IntBuffer(); else buf.reset();
  intersectSortedIntArrays_recurse(a, b, buf, 0, la, 0, lb);
  return buf.toArray();
}

static void intersectSortedIntArrays_recurse(int[] a, int[] b, IntBuffer buf, int aFrom, int aTo, int bFrom, int bTo) {
  if (aFrom >= aTo || bFrom >= bTo) return; // nothing to do
  
  // start in the middle of a, search this element in b
  int i = (aFrom+aTo)/2;
  int x = a[i];
  int j = Arrays.binarySearch(b, bFrom, bTo, x);

  if (j >= 0) {
    // element found
    intersectSortedIntArrays_recurse(a, b, buf, aFrom, i, bFrom, j);
    buf.add(x);
    intersectSortedIntArrays_recurse(a, b, buf, i+1, aTo, j+1, bTo);
  } else {
    j = -j-1;
    intersectSortedIntArrays_recurse(a, b, buf, aFrom, i, bFrom, j);
    intersectSortedIntArrays_recurse(a, b, buf, i+1, aTo, j, bTo);
  }
}


static int lIntArray(int[] a) {
  return a == null ? 0 : a.length;
}


static class IntBuffer {
  int[] data;
  int size;
  
  IntBuffer() {}
  IntBuffer(int size) { if (size != 0) data = new int[size]; }
  
  void add(int i) {
    if (size >= lIntArray(data))
      data = resizeIntArray(data, Math.max(1, lIntArray(data)*2));
    data[size++] = i;
  }
  
  int[] toArray() {
    return size == 0 ? null : resizeIntArray(data, size);
  }
  
  void reset() { size = 0; }
}

static int[] resizeIntArray(int[] a, int n) {
  if (n == lIntArray(a)) return a;
  int[] b = new int[n];
  arraycopy(a, 0, b, 0, Math.min(lIntArray(a), n));
  return b;
}

static void arraycopy(Object src, int srcPos, Object dest, int destPos, int n) {
  if (n != 0)
    System.arraycopy(src, srcPos, dest, destPos, n);
}

Let's consider two sorted arrays: -让我们考虑两个排序数组:-

int[] array1 = {1,2,3,4,5,6,7,8};
int[] array2 = {2,4,8};

int i=0, j=0;    //taken two pointers

While loop will run till both pointers reach up to the respective lengths. While 循环将一直运行,直到两个指针都达到各自的长度。

while(i<array1.length || j<array2.length){
    if(array1[i] > array2[j])     //if first array element is bigger then increment 2nd pointer
       j++;
    else if(array1[i] < array2[j]) // same checking for second array element
      i++;
    else {                         //if both are equal then print them and increment both pointers
        System.out.print(a1[i]+ " ");

        if(i==a1.length-1 ||j==a2.length-1)   //one additional check for ArrayOutOfBoundsException
            break;
        else{
            i++;
            j++;
        }
    }
}        

Output will be: -输出将是: -

2 4 8

Very Simple with the PYTHON使用 PYTHON 非常简单

Example: A=[1,2,3,5,7,9,90] B=[2,4,10,90]示例: A=[1,2,3,5,7,9,90] B=[2,4,10,90]

Here we go three lines of code下面我们来三行代码

for i in A:
     if(i in B):
        print(i)

Output:2, 90输出:2, 90

Here an answer I have tested working to match two arrays that are both sorted but might have duplicate keys and values as entries.这是我测试过的答案,以匹配两个已排序但可能具有重复键和值作为条目的数组。 Ie both lists are sorted by the key 'timestamp'.即两个列表都按关键字“时间戳”排序。 Then .equals detects matching.然后 .equals 检测匹配。 This one finds the intersection of a and b where the intersection of duplicates consumes them.这个找到 a 和 b 的交集,其中重复项的交集消耗它们。 Ie each element of a that matches an element in b uses up that a entry.即与 b 中的元素匹配的 a 中的每个元素都用完该条目。 Sorry for the specifics here from a particular project but maybe it is useful.对于来自特定项目的细节感到抱歉,但也许它很有用。

I finally did the solution below after a lot of missing around with HashSet (doesn't handle duplicates), Guava MultiSet (excellent but if you look into it, it has a LOT of overhead checking).在使用 HashSet(不处理重复项),Guava MultiSet(非常好,但如果你仔细研究它,它有很多开销检查)丢失了很多之后,我终于完成了下面的解决方案。

   /**
     * Finds the intersection of events in a that are in b. Assumes packets are
     * non-monotonic in timestamp ordering. 
     *
     *
     * @param a ArrayList<BasicEvent> of a
     * @param b likewise
     * @return ArrayList of intersection
     */
    private ArrayList<BasicEvent> countIntersect(ArrayList<BasicEvent> a, ArrayList<BasicEvent> b) {
        ArrayList<BasicEvent> intersect = new ArrayList(a.size() > b.size() ? a.size() : b.size());
        int count = 0;
        if (a.isEmpty() || b.isEmpty()) {
            return new ArrayList();
        }

        // TODO test case
//        a = new ArrayList();
//        b = new ArrayList();
//        a.add(new BasicEvent(4, (short) 0, (short) 0)); // first arg is the timestamp
//        a.add(new BasicEvent(4, (short) 0, (short) 0));
//        a.add(new BasicEvent(4, (short) 1, (short) 0));
//        a.add(new BasicEvent(4, (short) 2, (short) 0));
////        a.add(new BasicEvent(2, (short) 0, (short) 0));
////        a.add(new BasicEvent(10, (short) 0, (short) 0));
//
//        b.add(new BasicEvent(2, (short) 0, (short) 0));
//        b.add(new BasicEvent(2, (short) 0, (short) 0));
//        b.add(new BasicEvent(4, (short) 0, (short) 0));
//        b.add(new BasicEvent(4, (short) 0, (short) 0));
//        b.add(new BasicEvent(4, (short) 1, (short) 0));
//        b.add(new BasicEvent(10, (short) 0, (short) 0));
        int i = 0, j = 0;
        int na = a.size(), nb = b.size();
        while (i < na && j < nb) {
            if (a.get(i).timestamp < b.get(j).timestamp) {
                i++;
            } else if (b.get(j).timestamp < a.get(i).timestamp) {
                j++;
            } else {
                // If timestamps equal, it might be identical events or maybe not
                // and there might be several events with identical timestamps.
                // We MUST match all a with all b.
                // We don't want to increment both pointers or we can miss matches.
                // We do an inner double loop for exhaustive matching as long as the timestamps
                // are identical. 
                int i1 = i, j1 = j;
                while (i1 < na && j1 < nb && a.get(i1).timestamp == b.get(j1).timestamp) {
                    boolean match = false;
                    while (j1 < nb && i1 < na && a.get(i1).timestamp == b.get(j1).timestamp) {
                        if (a.get(i1).equals(b.get(j1))) {
                            count++;
                            intersect.add(b.get(j1)); // TODO debug
                            // we have a match, so use up the a element
                            i1++;
                            match = true;
                        }
                        j1++;
                    }
                    if (!match) {
                        i1++; // 
                    }
                    j1 = j; // reset j to start of matching timestamp region
                }
                i = i1; // when done, timestamps are different or we reached end of either or both arrays
                j = j1;
            }
        }
//        System.out.println("%%%%%%%%%%%%%%");
//        printarr(a, "a");
//        printarr(b, "b");
//        printarr(intersect, "intsct");
        return intersect;
    }

    // TODO test case
    void printarr(ArrayList<BasicEvent> a, String n) {
        final int MAX = 30;
        if (a.size() > MAX) {
            System.out.printf("--------\n%s[%d]>%d\n", n, a.size(), MAX);
            return;
        }
        System.out.printf("%s[%d] --------\n", n, a.size());
        for (int i = 0; i < a.size(); i++) {
            BasicEvent e = a.get(i);
            System.out.printf("%s[%d]=[%d %d %d %d]\n", n, i, e.timestamp, e.x, e.y, (e instanceof PolarityEvent) ? ((PolarityEvent) e).getPolaritySignum() : 0);
        }
    }
 //intersection of two arrays
#include<iostream>
using namespace std;
int main() {

int i=0,j=0,m,n;
int arr1[i],arr2[j];
cout<<"Enter the number of elements in array 1: ";
cin>> m;
cout<<"Enter the number of elements in array 2: ";
cin>>n;
for (i=0;i<m;i++){
    cin>> arr1[i];
}
for(j=0;j<n;j++){
    cin>> arr2[j];
}
for(j=0;j<n;j++){
    for(i=0;i<m;i++) {
        if (arr1[i] == arr2[j]){
        cout<< arr1[i];
        cout << ' ';
        break;
        }
    } 
 }    

 return 0;
 }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM