简体   繁体   English

针对兰德流的高性能缓冲

[英]High-performance buffering for a stream of rands

I have code that consumes a large number (millions currently, eventually billions) of relatively short (5-100 elements) arrays of random numbers and does some not-very-strenuous math with them. 我的代码消耗了大量(当前数百万,最终数十亿)相对较短(5-100个元素)的随机数阵列,并对它们进行了一些非常非常费力的数学运算。 Random numbers being, well, random, ideally I'd like to generate them on multiple cores, since random number generation is > 50% of my runtime in profiling. 随机数是,随机的,理想情况下我想在多个核心上生成它们,因为随机数生成大约是我的运行时分析的50%。 However, I'm having difficulty distributing a large number of small tasks in a way that's not slower than the single-threaded approach. 但是,我很难以一种不比单线程方法慢的方式分发大量小任务。

My code currently looks something like this: 我的代码目前看起来像这样:

for(int i=0;i<1000000;i++){
    for(RealVector d:data){
        while(!converged){
            double[] shortVec = new double[5];
            for(int i=0;i<5;i++) shortVec[i]=rng.nextGaussian();
            double[] longerVec = new double[50];
            for(int i=0;i<50;i++) longerVec[i]=rng.nextGaussian();
            /*Do some relatively fast math*/
        }
    }
}

Approaches I've taken that have not worked are: 我采取的方法没有奏效的是:

  • 1+ threads populating an ArrayBlockingQueue, and my main loop consuming and populating the array (the boxing/unboxing was killer here) 1+个线程填充ArrayBlockingQueue,我的主循环消耗并填充数组(装箱/取消装箱在这里是杀手)
  • Generating the vectors with a Callable (yielding a future) while doing the non-dependent parts of the math (it appears the overhead of the indirection outweighed whatever parallelism gains I got) 在执行数学的非依赖部分时生成具有Callable(产生未来)的向量(看起来间接的开销超过了我获得的任何并行性增益)
  • Using 2 ArrayBlockingQueue, each populated by a thread, one for the short and one for the long arrays (still roughly twice as slow as the direct single-threaded case). 使用2个ArrayBlockingQueue,每个由一个线程填充,一个用于short,一个用于长数组(仍然大约是直接单线程情况的两倍)。

I'm not looking for "solutions" to my particular problem so much as how to handle the general case of generating large streams of small, independent primitives in parallel and consuming them from a single thread. 我不是在寻找解决我特定问题的“解决方案”,而是如何处理并行生成大型小型独立基元流并从单个线程中消耗它们的一般情况。

The problem with your performance seems to be that the individual jobs are too small so most of the time is spent doing the synchronization and queueing of the jobs themselves. 您的性能问题似乎是单个作业太小,因此大部分时间花在执行同步和排队作业本身上。 One thing to consider is not to generate a large stream of small jobs but to deliver to each working thread a medium sized collection of jobs that it will annotate with the answer. 要考虑的一件事是不要生成大量的小作业流,而是要向每个工作线程提供一个中等大小的作业集合,它将用答案进行注释。

For example, instead of iterating through your loop with the first thread doing iteration #0, the next thread doing iteration #1, ... I would have the first thread do iterations #0 through #999 or some such. 例如,不是在第一个线程进行迭代#0的情况下迭代你的循环,而是进行迭代#1的下一个线程,......我将让第一个线程进行迭代#0到#999或其他一些。 They should be working independently and annotate a Job class with the answer of their calculations. 他们应该独立工作并使用他们的计算答案注释Job类。 Then at the end they can return the entire collection of the jobs that have been finished as a Future . 然后在最后,他们可以返回已完成作为Future的所有作业集合。

Your Job class might be something like the following: 您的Job类可能类似于以下内容:

public class Job {
    Collection<RealVector> dataCollection;
    Collection<SomeAnswer> answerCollection = new ArrayList<SomeAnswer>();
    public void run() {
        for (RealVector d : dataCollection) {
           // do the magic work on the vector
           while(!converged){
              ...
           }
           // put the associated "answer" in another collection
           answerCollection.add(someAnswer);
        }
    }
}

This is more efficient than using a Queue because; 这比使用队列更有效,因为;

  • the payload is an array of double[] meaning the background thread can generate more data before having to pass it off. 有效负载是double[]的数组,这意味着后台线程可以在必须将其传递之前生成更多数据。
  • all the objects are recycled. 所有的物品都被回收了。

.

public class RandomGenerator {
    private final ExecutorService generator = Executors.newSingleThreadExecutor(new ThreadFactory() {
        @Override
        public Thread newThread(Runnable r) {
            Thread t = new Thread(r, "generator");
            t.setDaemon(true);
            return t;
        }
    });
    private final Exchanger<double[][]> exchanger = new Exchanger<>();
    private double[][] buffer;
    private int nextRow = Integer.MAX_VALUE;

    public RandomGenerator(final int rows, final int columns) {
        buffer = new double[rows][columns];
        generator.submit(new Callable<Void>() {
            @Override
            public Void call() throws Exception {
                Random random = new Random();
                double[][] buffer2 = new double[rows][columns];
                while (!Thread.interrupted()) {
                    for (int r = 0; r < rows; r++)
                        for (int c = 0; c < columns; c++)
                            buffer2[r][c] = random.nextGaussian();
                    buffer2 = exchanger.exchange(buffer2);
                }
                return null;
            }
        });
    }

    public double[] nextArray() throws InterruptedException {
        if (nextRow >= buffer.length) {
            buffer = exchanger.exchange(buffer);
            nextRow = 0;
        }
        return buffer[nextRow++];
    }
}

Random is thread safe and synchronized. Random是线程安全和同步的。 This means each thread needs it own Random to perform concurrently. 这意味着每个线程都需要它自己的Random来同时执行。

how to handle the general case of generating large streams of small, independent primitives in parallel and consuming them from a single thread. 如何处理并行生成大型小型独立基元流并从单个线程中消耗它们的一般情况。

I would use an Exchanger<double[][]> to populate values in the background as pass them efficiently (without much GC overhead) 我会使用一个Exchanger<double[][]>来填充后台的值,因为它们有效地传递它们(没有太多的GC开销)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM