简体   繁体   中英

Parallel processing of arrays using ExecutorService and arrayCopy()

I have a variety of different algorithms which are each resource intensive, and which each have to be processed on millions of inputs. I would like to divide the inputs into blocks, then have the blocks processed in parallel, and then finally have the results assembled into a single output array in correct order.

I have been doing research on the matter, and the consensus seems to be that I should use ExecutorService and arraycopy() . However, I am not sure how to determine the optimal number of threads to create, and I do not know how to structure the code in a way that eliminates the risk of bugs. It would be nice if I knew that each thread was terminated after it creates its resulting array. Finally, the code I wrote below is also giving me a null pointer error.

Can some of you please edit the code below so that it accomplishes my above stated goals as fast as possible, while eliminating the risk of bugs? It would be nice if the code below could run in 5 or 10 milliseconds. The random numbers in the array are nothing more than placeholders to serve as a benchmark to compare threading options. I do not need to optimize the random number generation because my actual algorithms have nothing to do with random number generation. Here is my work in process:

import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class ParallelArrays {
    private static final int numJobs = 5;
    static int numElements = 1473200;
    static int blockSize = 300000;
    static int remaining;
    String number;
    static int[][] data2D = new int[numJobs][];
    static int[] data1D;
    static int size;
    static int currIdx;

    public static void main(String args[]) {
        long startTime = System.currentTimeMillis();
        remaining = numElements-blockSize;
        // create a pool of threads, 10 max jobs will execute in parallel
        ExecutorService threadPool = Executors.newFixedThreadPool(10);
        // submit jobs to be executing by the pool
        for (int i = 0; i < numJobs; i++) {
            currIdx = i;
            System.out.println("This coming iteration would leave us with remaining, blockSize: "+remaining+", "+blockSize);
            if(remaining>=0){System.out.println("blockSize is: "+blockSize);}
            else{
                blockSize = (blockSize+remaining);
                remaining = 0;
                System.out.println("else blockSize is: "+blockSize);
            }
            System.out.println("After iteration, remaining, blockSize are: "+remaining+", "+blockSize);
            threadPool.submit(new Runnable() {
                public void run() {
                    Random r = new Random();
                    data2D[currIdx] = new int[blockSize];
                    for(int j=0;j<data2D[currIdx].length;j++){
                        data2D[currIdx][j] = r.nextInt(255)*r.nextInt(255)*r.nextInt(255);
                    }
                }
            });
            remaining -= blockSize;
        }
        //Now collapse data2D into a 1D array
        data1D = new int[numElements];
        int startPos = 0;
        for(int k=0;k<numJobs;k++){
            System.out.println("startPos is: "+startPos);
            //arraycopy(Object src, int srcPos, Object dest, int destPos, int length);
            System.out.println("k is: "+k);
            System.out.println("data2D[k].length is: "+data2D[k].length);
            System.arraycopy(data2D[k], 0, data1D, startPos, data2D[k].length);
            startPos += data2D[k].length;
        }
        threadPool.shutdown();
        System.out.println("Main thread exiting.");
        long endTime = System.currentTimeMillis();
        System.out.println("Elapsed time is: "+(endTime-startTime));
    }
}  

SECOND EDIT:

In response to Ralf H's suggestions, I have edited my code as follows. It is still throwing the same null pointer exception, which I will include again below. I would much appreciate any help rewriting this code so that it runs correctly without throwing the null pointer exception:

package myPackage;

import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class ParallelArrays {
    private final static int numJobs = 5;
    static int numElements = 1473200;
    static int blockSize = 300000;
    static int remaining;
    String number;
    static int[][] data2D = new int[numJobs][];
    static int[] data1D;
//  static int size;
    static int currIdx;
    static int numAdded = 0;

    public static void main(String args[]) {runAlgorithm();}

    static void runAlgorithm(){
        long startTime = System.currentTimeMillis();
        remaining = numElements-blockSize;
        ExecutorService threadPool = Executors.newFixedThreadPool(10);
        for (int i = 0; i < numJobs; i++) {// submit jobs to be executing by the pool
            currIdx = i;
            if(remaining<0){//last block will be smaller than the rest
                blockSize = (blockSize+remaining);
                remaining = 0;
            }
            final int fCurrIdx = i;
            threadPool.submit(new Runnable() {
                public void run() {
                    Random r = new Random();
                    data2D[fCurrIdx] = new int[blockSize];
                    System.out.println("fCurrIdx is: "+fCurrIdx);
                    for(int j=0;j<data2D[fCurrIdx].length;j++){
                        data2D[fCurrIdx][j] = r.nextInt(255)*r.nextInt(255)*r.nextInt(255);
                    }
                    numAdded += 1;
                }
            });
            remaining -= blockSize;
        }
        //Now collapse data2D into a 1D array
        data1D = new int[numElements];
        System.out.println("numAdded, data2D.length is: "+numAdded+", "+data2D.length);
        int startPos = 0;
        for(int k=0;k<numJobs;k++){
            System.out.println("startPos is: "+startPos);
            //arraycopy(Object src, int srcPos, Object dest, int destPos, int length);
            System.out.println("k, data2D["+k+"].length are: "+k+", "+data2D[k].length); // NullPointerException here
            System.arraycopy(data2D[k], 0, data1D, startPos, data2D[k].length);
            startPos += data2D[k].length;
        }
        threadPool.shutdown();
        System.out.println("Main thread exiting.");
        long endTime = System.currentTimeMillis();
        System.out.println("Elapsed time is: "+(endTime-startTime));
    }
}  

Here is the stack trace for the null pointer error being thrown by the revised code:

Exception in thread "main" java.lang.NullPointerException  
    at myPackage.ParallelArrays.runAlgorithm(ParallelArrays.java:52)  
    at myPackage.ParallelArrays.main(ParallelArrays.java:19)  

I think the problem is that the code needs to use a Future object along with the ExecutorService . But I am not sure of the syntax for this specific code.

I think ForkJoinPool is better for this task. It's designed for efficient parallel processing, see http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html .

I ran into the same kind of problems lately, they were due to synchronization between tasks using the same non-synchronized variables (comment left for other people reading that :)).

In your case, as @Ralf mentioned, you're not waiting for the termination of the pool. So your data2D array is still filled with null ( data2D[k] == null for all k ), and you get an NPE when doing data2D[k].length .

I have tried to run the second version of your code and it throws NPE sometimes, after I run it 10 times. It's gone when calling awaitTermination() :

threadPool.shutdown();
try {
    while (!threadPool.awaitTermination(1, TimeUnit.SECONDS)) ;
} catch (InterruptedException e) {
    e.printStackTrace();
}

//Now collapse data2D into a 1D array
data1D = new int[numElements];
...

you are using currIdx many times in the Runnable. There are writes to currIdx from outside that Runnable since it is static . In the Runnable, better use local variable, maybe even make it final before entering the Runnable:

final int fCurrIdx = i;
threadPool.submit(new Runnable() {
    public void run() {
        Random r = new Random();
        int[] data = new int[blockSize];
        for( int j=0; j<data.length; j++){
            data[j] = r.nextInt(255) * r.nextInt(255) * r.nextInt(255);
        }
        data2D[fCurrIdx] = data;
    }
});

In fact, I would create the new int[blockSize] locally, fill it, and assign it to data2D at the end.

Are you sure you need a new Random every time? Is there a reason to make currIdx (or others) static?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM