简体   繁体   English

在两个未排序的数组中查找公共元素

[英]Find common elements in two unsorted array

I try to find a solution to this problem: I have two arrays A and B of integers (A and B can have different dimensions). 我试图找到这个问题的解决方案:我有两个整数A和B(A和B可以有不同的维度)。 I have to find the common elements in these two arrays. 我必须在这两个数组中找到共同的元素。 I have another condition: the maximum distance between the common elements is k. 我有另一个条件:公共元素之间的最大距离是k。 So, this is my solution. 所以,这是我的解决方案。 I think is correct: 我认为是正确的:

for (int i = 0; i<A.length; i++){
    for (int j=jlimit; (j<B.length) && (j <= ks); j++){
        if(A[i]==B[j]){
            System.out.println(B[j]);
            jlimit = j;
            ks = j+k;
        }//end if
    }
}

Is there a way to make a better solution? 有没有办法做出更好的解决方案? Any suggestions? 有什么建议? Thanks in advance! 提前致谢!

Given your explanation, I think the most direct approach is reading array A, putting all elements in a Set (setA), do the same with B (setB), and use the retainAll method to find the intersection of both sets (items that belong to both of the sets). 根据您的解释,我认为最直接的方法是读取数组A,将所有元素放在Set (setA)中,对B(setB)执行相同操作,并使用retainAll方法查找两个集合的交集(属于的项目)两组)。

You will see that the k distance is not used at all, but I see no way to use that condition that leads to code either faster or more maintenable. 您将看到根本没有使用k distance ,但我认为无法使用导致代码更快或更可维护的条件。 The solution I advocate works without enforcing that condition, so it works also when the condition is true (that is called "weakening the preconditions") 我提倡的解决方案在不强制执行该条件的情况下工作,因此当条件为真时(即称为“弱化前提条件”)它也可以工作

IMPLEMENT BINARY SEARCH AND QUICK SORT! 实施二进制搜索和快速排序!

this will lead to tons of code.... but the fastest result. 这将导致大量的代码....但最快的结果。

You can sort the elements of the larger array with like quick sort which would lead to O(nlogn). 您可以使用快速排序来排序较大数组的元素,这将导致O(nlogn)。

then iterate through the smaller array for each value and do a binary search of that particular element in the other array. 然后为每个值迭代较小的数组,并对另一个数组中的特定元素进行二进制搜索。 Add some logic for the distance in the binary search. 在二进制搜索中为距离添加一些逻辑。

I think you can get the complexity down to O(nlogn). 我认为你可以将复杂性降低到O(nlogn)。 Worst case O(n^2) 最坏情况O(n ^ 2)

pseudo code. 伪代码。

larger array equals a
other array equals b

sort a

iterate through b
       binary search b at iterated index
     // I would throw (last index - index) logic in binary search
     // to exit out of that even faster by returning "NOT FOUND" as soon as that is hit.
       if found && (last index - index) is less than or equal 
          store last index
          print value

this is the fastest way possible to do your problem i believe. 这是我认为最快的方法来解决你的问题。

Although this would be a cheat , since it uses HashSet s, it is pretty nice for a Java implementation of this algorithm. 虽然这是一个骗子 ,因为它使用了HashSet ,但对于这种算法的Java实现来说非常好。 If you need the pseudocode for the algorithm, don't read any further. 如果您需要算法的伪代码,请不要再进一步阅读。

Source and author in the JavaDoc. JavaDoc中的源代码和作者。 Cheers. 干杯。

/**
 * @author Crunchify.com
 */
public class CrunchifyIntersection {

    public static void main(String[] args) {
         Integer[ ] arrayOne = { 1, 4, 5, 2, 7, 3, 9 };
         Integer[ ] arrayTwo = { 5, 2, 4, 9, 5 };

         Integer[ ] common = iCrunchIntersection.findCommon( arrayOne, arrayTwo );

         System.out.print( "Common Elements Between Two Arrays: " );       
         for( Integer entry : common ) {
              System.out.print( entry + " " );
         }
   }

   public static Integer[ ] findCommon( Integer[ ] arrayOne, Integer[ ] arrayTwo ) {

        Integer[ ] arrayToHash;
        Integer[ ] arrayToSearch;

        if( arrayOne.length < arrayTwo.length ) {
            arrayToHash = arrayOne;
            arrayToSearch = arrayTwo;
        } else {
            arrayToHash = arrayTwo;
            arrayToSearch = arrayOne;
        }

        HashSet<Integer> intersection = new HashSet<Integer>( );

        HashSet<Integer> hashedArray = new HashSet<Integer>( );
        for( Integer entry : arrayToHash ) {
            hashedArray.add( entry );
        }

        for( Integer entry : arrayToSearch ) {
            if( hashedArray.contains( entry ) ) {
                intersection.add( entry );
            }
        }

        return intersection.toArray( new Integer[ 0 ] );
    }
 }

Your implementation is roughly O(A.length*2k). 您的实现大致为O(A.length * 2k)。

That seems to be about the best you're going to do if you want to maintain your "no more than k away" logic , as that rules out sorting and the use of sets . 如果你想保持你的“不超过k”的逻辑 ,这似乎是你要做的最好的事情,因为这排除了排序和集合的使用 I would alter a little to make your code more understandable. 我会改变一点,使你的代码更容易理解。

  1. First, I would ensure that you iterate over the smaller of the two arrays. 首先,我将确保您迭代两个数组中较小的一个。 This would make the complexity O(min(A.length, B.length)*2k). 这将使复杂度为O(min(A.length,B.length)* 2k)。

    To understand the purpose of this, consider the case where A has 1 element and B has 100. In this case, we are only going to perform one iteration in the outer loop, and k iterations in the inner loop. 为了理解这个的目的,考虑A有1个元素而B有100的情况。在这种情况下,我们只在外循环中执行一次迭代,在内循环中执行k次迭代。

    Now consider when A has 100 elements, and B has 1. In this case, we will perform 100 iterations on the outer loop, and 1 iteration each on the inner loop. 现在考虑A何时有100个元素, B有1.在这种情况下,我们将在外循环上执行100次迭代,并在内循环上执行1次迭代。

    If k is less than the length of your long array, iterating over the shorter array in the outer loop will be more efficient. 如果k小于long数组的长度,则在外部循环中迭代较短的数组将更有效。

  2. Then, I would change how you're calculating the k distance stuff just for readability's sake. 然后,为了便于阅读,我会改变你计算k距离的方法。 The code I've written demonstrates this. 我写的代码证明了这一点。

Here's what I would do: 这就是我要做的事情:

//not sure what type of array we're dealing with here, so I'll assume int.
int[] toIterate;
int[] toSearch;

if (A.length > B.length)
{
    toIterate = B;
    toSearch = A;
}
else
{
    toIterate = A;
    toSearch = B;
}

for (int i = 0; i < toIterate.length; i++)
{
    // set j to k away in the negative direction
    int j = i - k;

    if (j < 0) 
        j = 0;

    // only iterate until j is k past i
    for (; (j < toSearch.length) && (j <= i + k); j++)
    {
        if(toIterate[i] == toSearch[j])
        {
            System.out.println(toSearch[j]);
        }
    }
}

Your use of jlimit and ks may work, but handling your k distance like this is more understandable for your average programmer (and it's marginally more efficient). 你使用jlimitks可能会有效,但是像这样处理你的k距离对你的普通程序员来说更容易理解(并且它的效率稍微提高一些)。

O(N) solution (BloomFilters): O(N)解决方案(BloomFilters):

Here is a solution using bloom filters (implementation is from the Guava library) 这是一个使用bloom过滤器的解决方案(实现来自Guava库)

public static <T> T findCommon_BloomFilterImpl(T[] A, T[] B, Funnel<T> funnel) {
    BloomFilter<T> filter = BloomFilter.create(funnel, A.length + B.length);
    for (T t : A) {
        filter.put(t);
    }
    for (T t : B) {
        if (filter.mightContain(t)) {
            return t;
        }
    }
    return null;
}

use it like this: 像这样用它:

    Integer j = Masking.findCommon_BloomFilterImpl(new Integer[]{12, 2, 3, 4, 5222, 622, 71, 81, 91, 10}, new Integer[]{11, 100, 15, 18, 79, 10}, Funnels.integerFunnel());
    Assert.assertNotNull(j);
    Assert.assertEquals(10, j.intValue());

Runs in O(N) since calculating hash for Integer is pretty straight forward. 因为计算整数的哈希值而在O(N)中运行非常简单。 So still O(N) if you can reduce the calculation of hash of your elementents to O(1) or a small O(K) where K is the size of each element. 所以仍然是O(N),如果你可以将你的元素的散列计算减少到O(1)或小O(K),其中K是每个元素的大小。

O(N.LogN) solution (sorting and iterating): O(N.LogN)解决方案(排序和迭代):

Sorting and the iterating through the array will lead you to a O(N*log(N)) solution: 排序和遍历数组将引导您进入O(N * log(N))解决方案:

public static <T extends Comparable<T>> T findCommon(T[] A, T[] B, Class<T> clazz) {
    T[] array = concatArrays(A, B, clazz);
    Arrays.sort(array);
    for (int i = 1; i < array.length; i++) {
        if (array[i - 1].equals(array[i])) {     //put your own equality check here
            return array[i];
        }
    }
    return null;
}

concatArrays(~) is in O(N) of course. concatArrays(~)当然在O(N)中。 Arrays.sort(~) is a bi-pivot implementation of QuickSort with complexity in O(N.logN), and iterating through the array again is O(N). Arrays.sort(~)是QuickSort的双轴实现,其复杂度为O(N.logN),并且再次遍历数组是O(N)。

So we have O((N+2).logN) ~> O(N.logN). 所以我们有O((N + 2).logN)〜> O(N.logN)。

As a general case solution (withouth the "within k" condition of your problem) is better than yours. 作为一般情况解决方案(没有问题的“k内”条件)比你的好。 It should be considered for k "close to" N in your precise case. 在您的确切情况下,应该考虑k“接近”N.

Simple solution if arrays are already sorted 数组已经排序的简单解决方案

 public static void get_common_courses(Integer[] courses1, Integer[] courses2) {
        // Sort both arrays if input is not sorted 
        //Arrays.sort(courses1);
        //Arrays.sort(courses2);
        int i=0, j=0;
        while(i<courses1.length && j<courses2.length) {
            if(courses1[i] > courses2[j]) {
                j++;
            } else if(courses1[i] < courses2[j]){
                i++;
            } else {
                System.out.println(courses1[i]);
                i++;j++;
            }
        }
}

Apache commons collections API has done this in efficient way without sorting Apache commons collections API以高效的方式完成了这项工作而没有排序

    public static Collection intersection(final Collection a, final Collection b) {
    ArrayList list = new ArrayList();
    Map mapa = getCardinalityMap(a);
    Map mapb = getCardinalityMap(b);
    Set elts = new HashSet(a);
    elts.addAll(b);
    Iterator it = elts.iterator();
    while(it.hasNext()) {
        Object obj = it.next();
        for(int i=0,m=Math.min(getFreq(obj,mapa),getFreq(obj,mapb));i<m;i++) {
            list.add(obj);
        }
    }
    return list;
}

Solution using Java 8 使用Java 8的解决方案

static <T> Collection<T> intersection(Collection<T> c1, Collection<T> c2) {
    if (c1.size() < c2.size())
        return intersection(c2, c1);
    Set<T> c2set = new HashSet<>(c2);
    return c1.stream().filter(c2set::contains).distinct().collect(Collectors.toSet());
}

Use Arrays::asList and boxed values of primitives: 使用Arrays :: asList和基元的盒装值:

Integer[] a =...    
Collection<Integer> res = intersection(Arrays.asList(a),Arrays.asList(b));

Generic solution 通用解决方案

public static void main(String[] args) {
    String[] a = { "a", "b" };
    String[] b = { "c", "b" };
    String[] intersection = intersection(a, b, a[0].getClass());
    System.out.println(Arrays.toString(intersection));
    Integer[] aa = { 1, 3, 4, 2 };
    Integer[] bb = { 1, 19, 4, 5 };
    Integer[] intersectionaabb = intersection(aa, bb, aa[0].getClass());
    System.out.println(Arrays.toString(intersectionaabb));
}

@SuppressWarnings("unchecked")
private static <T> T[] intersection(T[] a, T[] b, Class<? extends T> c) {
    HashSet<T> s = new HashSet<>(Arrays.asList(a));
    s.retainAll(Arrays.asList(b));
    return s.toArray((T[]) Array.newInstance(c, s.size()));
}

Output 产量

[b]
[1, 4]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM