简体   繁体   English

使用ExecutorService和arrayCopy()并行处理数组

[英]Parallel processing of arrays using ExecutorService and arrayCopy()

I have a variety of different algorithms which are each resource intensive, and which each have to be processed on millions of inputs. 我有各种不同的算法,每种算法都占用大量资源,并且每种算法都必须在数百万个输入上进行处理。 I would like to divide the inputs into blocks, then have the blocks processed in parallel, and then finally have the results assembled into a single output array in correct order. 我想将输入分成多个块,然后并行处理这些块,最后将结果按正确顺序组装到单个输出数组中。

I have been doing research on the matter, and the consensus seems to be that I should use ExecutorService and arraycopy() . 我一直在对此事进行研究,共识似乎是我应该使用ExecutorServicearraycopy() However, I am not sure how to determine the optimal number of threads to create, and I do not know how to structure the code in a way that eliminates the risk of bugs. 但是,我不确定如何确定要创建的最佳线程数,也不知道如何以消除错误风险的方式来构造代码。 It would be nice if I knew that each thread was terminated after it creates its resulting array. 如果我知道每个线程在创建结果数组之后都终止了,那就太好了。 Finally, the code I wrote below is also giving me a null pointer error. 最后,我在下面编写的代码也给了我一个空指针错误。

Can some of you please edit the code below so that it accomplishes my above stated goals as fast as possible, while eliminating the risk of bugs? 你们中的一些人是否可以编辑下面的代码,以使其尽快实现上述目标,同时消除错误的风险? It would be nice if the code below could run in 5 or 10 milliseconds. 如果下面的代码可以在5或10毫秒内运行,那就太好了。 The random numbers in the array are nothing more than placeholders to serve as a benchmark to compare threading options. 数组中的随机数只不过是占位符,用作比较线程选项的基准。 I do not need to optimize the random number generation because my actual algorithms have nothing to do with random number generation. 我不需要优化随机数生成,因为我的实际算法与随机数生成无关。 Here is my work in process: 这是我正在进行的工作:

import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class ParallelArrays {
    private static final int numJobs = 5;
    static int numElements = 1473200;
    static int blockSize = 300000;
    static int remaining;
    String number;
    static int[][] data2D = new int[numJobs][];
    static int[] data1D;
    static int size;
    static int currIdx;

    public static void main(String args[]) {
        long startTime = System.currentTimeMillis();
        remaining = numElements-blockSize;
        // create a pool of threads, 10 max jobs will execute in parallel
        ExecutorService threadPool = Executors.newFixedThreadPool(10);
        // submit jobs to be executing by the pool
        for (int i = 0; i < numJobs; i++) {
            currIdx = i;
            System.out.println("This coming iteration would leave us with remaining, blockSize: "+remaining+", "+blockSize);
            if(remaining>=0){System.out.println("blockSize is: "+blockSize);}
            else{
                blockSize = (blockSize+remaining);
                remaining = 0;
                System.out.println("else blockSize is: "+blockSize);
            }
            System.out.println("After iteration, remaining, blockSize are: "+remaining+", "+blockSize);
            threadPool.submit(new Runnable() {
                public void run() {
                    Random r = new Random();
                    data2D[currIdx] = new int[blockSize];
                    for(int j=0;j<data2D[currIdx].length;j++){
                        data2D[currIdx][j] = r.nextInt(255)*r.nextInt(255)*r.nextInt(255);
                    }
                }
            });
            remaining -= blockSize;
        }
        //Now collapse data2D into a 1D array
        data1D = new int[numElements];
        int startPos = 0;
        for(int k=0;k<numJobs;k++){
            System.out.println("startPos is: "+startPos);
            //arraycopy(Object src, int srcPos, Object dest, int destPos, int length);
            System.out.println("k is: "+k);
            System.out.println("data2D[k].length is: "+data2D[k].length);
            System.arraycopy(data2D[k], 0, data1D, startPos, data2D[k].length);
            startPos += data2D[k].length;
        }
        threadPool.shutdown();
        System.out.println("Main thread exiting.");
        long endTime = System.currentTimeMillis();
        System.out.println("Elapsed time is: "+(endTime-startTime));
    }
}  

SECOND EDIT: 第二编辑:

In response to Ralf H's suggestions, I have edited my code as follows. 为了响应Ralf H的建议,我对代码进行了如下编辑。 It is still throwing the same null pointer exception, which I will include again below. 它仍然抛出相同的空指针异常,我将在下面再次包含它。 I would much appreciate any help rewriting this code so that it runs correctly without throwing the null pointer exception: 我将不胜感激任何重写此代码的帮助,以便它可以正确运行而不会引发空指针异常:

package myPackage;

import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class ParallelArrays {
    private final static int numJobs = 5;
    static int numElements = 1473200;
    static int blockSize = 300000;
    static int remaining;
    String number;
    static int[][] data2D = new int[numJobs][];
    static int[] data1D;
//  static int size;
    static int currIdx;
    static int numAdded = 0;

    public static void main(String args[]) {runAlgorithm();}

    static void runAlgorithm(){
        long startTime = System.currentTimeMillis();
        remaining = numElements-blockSize;
        ExecutorService threadPool = Executors.newFixedThreadPool(10);
        for (int i = 0; i < numJobs; i++) {// submit jobs to be executing by the pool
            currIdx = i;
            if(remaining<0){//last block will be smaller than the rest
                blockSize = (blockSize+remaining);
                remaining = 0;
            }
            final int fCurrIdx = i;
            threadPool.submit(new Runnable() {
                public void run() {
                    Random r = new Random();
                    data2D[fCurrIdx] = new int[blockSize];
                    System.out.println("fCurrIdx is: "+fCurrIdx);
                    for(int j=0;j<data2D[fCurrIdx].length;j++){
                        data2D[fCurrIdx][j] = r.nextInt(255)*r.nextInt(255)*r.nextInt(255);
                    }
                    numAdded += 1;
                }
            });
            remaining -= blockSize;
        }
        //Now collapse data2D into a 1D array
        data1D = new int[numElements];
        System.out.println("numAdded, data2D.length is: "+numAdded+", "+data2D.length);
        int startPos = 0;
        for(int k=0;k<numJobs;k++){
            System.out.println("startPos is: "+startPos);
            //arraycopy(Object src, int srcPos, Object dest, int destPos, int length);
            System.out.println("k, data2D["+k+"].length are: "+k+", "+data2D[k].length); // NullPointerException here
            System.arraycopy(data2D[k], 0, data1D, startPos, data2D[k].length);
            startPos += data2D[k].length;
        }
        threadPool.shutdown();
        System.out.println("Main thread exiting.");
        long endTime = System.currentTimeMillis();
        System.out.println("Elapsed time is: "+(endTime-startTime));
    }
}  

Here is the stack trace for the null pointer error being thrown by the revised code: 这是修改后的代码引发的空指针错误的堆栈跟踪:

Exception in thread "main" java.lang.NullPointerException  
    at myPackage.ParallelArrays.runAlgorithm(ParallelArrays.java:52)  
    at myPackage.ParallelArrays.main(ParallelArrays.java:19)  

I think the problem is that the code needs to use a Future object along with the ExecutorService . 我认为问题在于代码需要与ExecutorService一起使用Future对象。 But I am not sure of the syntax for this specific code. 但是我不确定此特定代码的语法。

I think ForkJoinPool is better for this task. 我认为ForkJoinPool对于此任务更好。 It's designed for efficient parallel processing, see http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html . 它旨在进行有效的并行处理,请参阅http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html

I ran into the same kind of problems lately, they were due to synchronization between tasks using the same non-synchronized variables (comment left for other people reading that :)). 最近,我遇到了同样的问题,这是由于使用相同的非同步变量的任务之间的同步(其他阅读此评论的人留下的评论:)。

In your case, as @Ralf mentioned, you're not waiting for the termination of the pool. 就您而言,如@Ralf所述,您不必等待池的终止。 So your data2D array is still filled with null ( data2D[k] == null for all k ), and you get an NPE when doing data2D[k].length . 因此,您的data2D数组仍然填充为null (所有k data2D[k] == null ),并且在执行data2D[k].length时会得到一个NPE。

I have tried to run the second version of your code and it throws NPE sometimes, after I run it 10 times. 我尝试运行您的代码的第二个版本,并且在运行10次后,有时会抛出NPE。 It's gone when calling awaitTermination() : 调用awaitTermination()时不见了:

threadPool.shutdown();
try {
    while (!threadPool.awaitTermination(1, TimeUnit.SECONDS)) ;
} catch (InterruptedException e) {
    e.printStackTrace();
}

//Now collapse data2D into a 1D array
data1D = new int[numElements];
...

you are using currIdx many times in the Runnable. 您在Runnable中多次使用currIdx There are writes to currIdx from outside that Runnable since it is static . 由于该RunnablestaticcurrIdx从该Runnable外部写入currIdx In the Runnable, better use local variable, maybe even make it final before entering the Runnable: 在Runnable中,最好使用局部变量,甚至可以在进入Runnable之前使其成为final变量:

final int fCurrIdx = i;
threadPool.submit(new Runnable() {
    public void run() {
        Random r = new Random();
        int[] data = new int[blockSize];
        for( int j=0; j<data.length; j++){
            data[j] = r.nextInt(255) * r.nextInt(255) * r.nextInt(255);
        }
        data2D[fCurrIdx] = data;
    }
});

In fact, I would create the new int[blockSize] locally, fill it, and assign it to data2D at the end. 实际上,我将在本地创建新的int [blockSize],将其填充,最后将其分配给data2D。

Are you sure you need a new Random every time? 您确定每次都需要一个新的Random吗? Is there a reason to make currIdx (or others) static? 是否有必要使currIdx(或其他)静态?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM