简体   繁体   English

如何以最快的方式交叉两个排序的数组?

[英]How to intersect two sorted arrays the fastest possible way?

I have two huge sorted arrays (~100K items each). 我有两个巨大的排序数组(每个~100K项)。 I need to intersect them really fast . 我需要非常快地交叉它们。 Now I am doing it the standard way: 现在我以标准的方式做到这一点:

  • if a[i] < b[j] then i++ 如果a [i] <b [j]则i ++
  • if a[i] > b[j] then j++ 如果a [i]> b [j]那么j ++
  • else: add a[i] to intersection, i++, j++ else:在交集,i ++,j ++中添加[i]

But it takes too long (~ 350 microseconds) to complete, which results in quite poor overall performance. 但是完成时间太长(~350微秒),导致整体性能相当差。 Is there a way to do it quicker? 有没有办法更快地做到这一点?

PS Intersection size is not larger that 1000 items (on average) and I need only 25 to 100 of them. PS交叉口尺寸不大于1000件(平均而言),我只需要25到100件。

Running through 2 100k arrays in parallel requires about 200k comparisons. 并行运行2个100k阵列需要大约200k比较。 You are currently completing it in 350 microseconds = 350k nanoseconds. 您目前正在以350微秒= 350k纳秒的速度完成它。 So your per comparison time is just under 2 nanoseconds. 所以你的每个比较时间不到2纳秒。 If your CPU is around 4 GHz, then that's 8 clock cycles. 如果你的CPU大约是4 GHz,那么这是8个时钟周期。

That's good. 那很好。 You could try to be sophisticated, detecting runs and so on, but probably you'll hurt yourself more with pipeline stalls than you'll save work. 你可以尝试复杂,检测跑步等等,但是你可能会因为管道档位而伤害自己,而不是节省工作。

There are only 2 ways to speed this up. 只有两种方法可以加快速度。 Do less work, or add more workers. 减少工作量,或增加更多工人。

You've indicating that doing less work is feasible, which is why Tamas Hegedus suggested that. 你已经表明减少工作是可行的,这就是为什么Tamas Hegedus建议的。 Instead of creating the intersection, create an Iterator that will return the next thing in the intersection. 而不是创建交集,创建一个Iterator ,它将返回交集中的下一个东西。 This will require you to rewrite the logic that uses said iterator, but you'll do under 10% of the current computation. 这将要求您重写使用所述迭代器的逻辑,但是您将在当前计算的10%以下进行。 Which is going to be close to 10x faster. 这将快接近10倍。

As for adding workers, you'll want to divide the work among worker threads and keep them from stepping all over each other. 至于添加工作者,你需要在工作线程之间划分工作并防止它们彼此踩踏。 For k small (no larger than your number of CPUs!), with a logarithmic amount of work in the size of your arrays, you can do a quickselect to find k-1 values that break the combined arrays into k even chunks ( oops Adapt http://www.geeksforgeeks.org/median-of-two-sorted-arrays/ instead of doing a quickselect...), and the indexes of those values in each array. 对于小k (不大于你的CPU数量!),在数组大小的对数工作量,你可以快速选择找到k-1值,将组合数组分成k偶数块( oops Adapt) http://www.geeksforgeeks.org/median-of-two-sorted-arrays/而不是做一个quickselect ...),以及每个数组中这些值的索引。 This creates k problems of even difficulty, each of which can be specified as 4 numbers. 这会产生甚至困难的k问题,每个问题都可以指定为4个数字。 Spin up k threads and let each get a chunk of the answer. 旋转k线程,让每个线程获得答案的一部分。 This will be roughly k times faster than what you are currently doing. 这将比您目前所做的快约k倍。

At the cost of a lot more effort, these approaches can be combined. 在更多的精力的成本,这些方法可以结合起来。 What you do is have the iterator create, say, 4 workers and hand out blocks to each. 你做的是让迭代器创建4个工人,然后分配给每个工人。 When you call iter.next() the iterator will hand you a next value that it has if it has one. 当你调用iter.next() ,迭代器会给你一个下一个值,如果它有一个值。 If it doesn't have one it will wait for the worker that is producing its next block to complete, grab that block, hand that worker another block if one is ready, and then hand out the first value in that block. 如果它没有,它将等待正在生成其下一个块的worker完成,抓取该块,如果一个就准备好,将该另一个块交给该worker,然后分发该块中的第一个值。 You can play with the block size. 您可以使用块大小。 You want it large enough that the CPU does a good job of figuring out that it should be streaming from RAM to CPU caches, and doesn't think that there is synchronization contention between threads. 您希望它足够大以至于CPU可以很好地确定它应该从RAM流式传输到CPU缓存,并且不认为线程之间存在同步争用。

My guess given the size and synchronization constraints, the hybrid approach won't be much of a win, if any, over the iterator approach. 考虑到大小和同步约束,我认为混合方法对于迭代器方法来说并不是一个胜利,如果有的话。 But if you're really desperate, you can try it. 但如果你真的很绝望,你可以尝试一下。

I am posting a naive implementation of the problem/solutions: 2 arrays filled with random ints. 我发布了一个问题/解决方案的天真实现:2个数组填充随机int。 If the threshold of 100 intersected values is reached, the loops break. 如果达到100个相交值的阈值,则循环中断。

One loops using the OP logic. 一个循环使用OP逻辑。 The other launches two threads each one processing one half of the array. 另一个启动两个线程,每个线程处理一半的数组。

It seems that the threading overhead can be an issue. 似乎线程开销可能是一个问题。 Or It may need fine tuning. 或者它可能需要微调。

It is a 20 runs sample. 这是一个20跑的样本。 Worst case scenario: no intersection that forces the run to the end of arrays. 最糟糕的情况:没有交叉点强制运行到数组的末尾。 Times are in microseconds. 时间以微秒为单位。

Workers: 2806
Workers: 4197
Workers: 4235
Workers: 818
Workers: 729
Workers: 3376
Workers: 740
Workers: 688
Workers: 2245
Workers: 732
Workers: 330
Workers: 945
Workers: 605
Workers: 630
Workers: 630
Workers: 334
Workers: 643
Workers: 309
Workers: 290
Workers: 761
done
Sorted: 1525
Sorted: 405
Sorted: 550
Sorted: 880
Sorted: 265
Sorted: 267
Sorted: 252
Sorted: 310
Sorted: 253
Sorted: 272
Sorted: 285
Sorted: 270
Sorted: 270
Sorted: 315
Sorted: 267
Sorted: 269
Sorted: 265
Sorted: 258
Sorted: 269
Sorted: 289
done

package so;

import java.util.Arrays;
import java.util.HashSet;
import java.util.Random;
import java.util.Set;
import java.util.concurrent.TimeUnit;
public final class CrazyClass {

    static class Feeder implements Runnable{
        final int b, e;
        int[] k1001;
        int[] k1002;

        final Set<Integer> setThis;

        Feeder(int[] ia, int[] ia1, int be, int en, Set<Integer> s){
            k1001 = ia;
            k1002= ia1;
            b = be;
            e = en;
            setThis = s;
        }

        public void run() {
            int i2 = b;
            for(int i1 = b; i1 < e; i1++){
                if (k1001[i1] == k1002[i2]){
                    synchronized(setThis){
                        setThis.add(k1001[i1]);
                        if (setThis.size() == 25){
                            System.out.println("bye!!!");
                            break;
                        }
                    }
                }
                else if (k1001[i1] < k1002[i2])
                    i1++;
                else if (k1001[i1] > k1002[i2])
                    i2++;
            }

        }
    }

    static void sorted(){
        int i1 = 0, i2 = 0;
        Set<Integer> result = new HashSet<Integer>();
        Random r = new Random();
        int[] k1001 = new int[100000];
        int[] k1002 = new int[100000];

        for(int i = 0; i< k1001.length; i++){
            k1001[i] = r.nextInt();
            k1002[i] = r.nextInt();
        }

        Arrays.sort(k1001);
        Arrays.sort(k1002);

        long l = System.nanoTime();

        for(; i1 < k1001.length; i1++){
            if (k1001[i1] == k1002[i2]){
                result.add(k1001[i1]);
                if (result.size() == 100){
                    System.out.println("bye!!!");
                    break;
                }
            }
            else if (k1001[i1] < k1002[i2])
                i1++;
            else if (k1001[i1] > k1002[i2])
                i2++;
        }
        l = System.nanoTime() - l;
        System.out.println("Sorted: " + TimeUnit.MICROSECONDS.convert(l, TimeUnit.NANOSECONDS));
    }

    static void workers(){
        Thread t1, t2;
        Set<Integer> setThis = new HashSet<Integer>();
        Random r = new Random();
        int[] k1001 = new int[100000];
        int[] k1002 = new int[100000];

        for(int i = 0; i< k1001.length; i++){
            k1001[i] = r.nextInt();
            k1002[i] = r.nextInt();
        }

        t1 = new Thread(new Feeder(k1001, k1002, 0, 49999, setThis));
        t2 = new Thread(new Feeder(k1001, k1002, 50000, 99999, setThis));
        try{
            long l = System.nanoTime();
            t1.start();
            t2.start();
            t1.join();
            t2.join();
            System.out.println("Workers: " + TimeUnit.MICROSECONDS.convert(System.nanoTime() - l, TimeUnit.NANOSECONDS));

        }catch(Exception x){

        }
    }

    static public void main(String[] args){
        int run = 20;
        for(int i = 0; i < run; i++)
            workers();
        System.out.println("done");
        for(int i = 0; i < run; i++)
            sorted();
        System.out.println("done");

    }
}

Below code runs within around 10 millis for me. 下面的代码对我来说在10毫安左右。 So I guess you are either processing strings or on a scripting language. 所以我猜你要么处理字符串要么是脚本语言。

package com.example.so.algorithms;

import java.util.Arrays;
import java.util.Random;

/**
 * <p> http://stackoverflow.com/questions/42538902/how-to-intersect-two-sorted-arrays-the-fastest-possible-way#comment72213844_42538902 </p>
 * <p> Given two sorted sub-lists of 100k each determine the first 10 intersecting (common) entries within 350 millis </p>
 * @author Ravindra
 * @since 03March2017
 *
 */
public class TestMergeIntersection {

    /**
     * <pre>
Time (millis):9
Result :[442958664, 932132404, 988442487, 1356502780, 1614742980, 1923995812, 1985016181, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    </pre>
     * @param args
     */
    public static void main(String[] args) {
        handleTest();
    }

    private static void handleTest() {
        int size = 1024*128;
        int intersectionCount = 100;
        int[] arrayOne = generateSortedSublist(size);
        int[] arrayTwo = generateSortedSublist(size);
        int[] result = new int[intersectionCount];
        int count = 0;
        int i=0;
        int j=0;
        long start = System.currentTimeMillis();
        while(count < 100 && i < size && j < size ) {
            if( arrayOne[i] < arrayTwo[j]) {
                i++;
            } else if(  arrayOne[i] > arrayTwo[j] ) {
                j++;
            } else {
                result[count] =arrayOne[i]; 
                i++;
                j++;
                count++;
            }
        }
        long end = System.currentTimeMillis();

        System.out.println("Time (millis):"+(end-start));
        System.out.println("Result :"+Arrays.toString(result));
    }

    private static int[] generateSortedSublist(int size) {

        Random random = new Random();
        int[] result = new int[size];

        for(int i=0;i<result.length;i++) {
            result[i] = random.nextInt(Integer.MAX_VALUE);
        }

        Arrays.sort(result);

        return result;
    }

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM