Summing each element of two arrays in parallel

Question

I have two 2-D arrays that I want to sum element-by-element. Both arrays are the same size same number of rows and columns). It should return a final array that is the same size with the element-by-element sums.

How can I use Java's Fork-Join Framework, or parallelism in general, to do such a task? Does it make sense to use parallelism for this problem?

Here is my unfinished attempt below with Java's Fork-Join framework:

public class SumArray extends RecursiveTask<int[][]> {

    private static final int ROW_CUTOFF = 10;
    private static final int COL_CUTOFF = 10;

    int[][] left_;
    int[][] right_;
    int rowLo_;
    int rowHi_;
    int colLo_;
    int colHi_;

    SumArray(int[][] left, int[][] right, int rowLo, int rowHi, int colLo, int colHi) {
        left_ = left;
        right_ = right;
        rowLo_ = rowLo;
        rowHi_ = rowHi;
        colLo_ = colLo;
        colHi_ = colHi;
    }

    @Override
    protected int[][] compute() {
        if (rowHi_ - rowLo_ <= ROW_CUTOFF && colHi_ - colLo_ <= COL_CUTOFF) {
            for (int i = rowLo_; i < rowHi_; i++) {
                for (int j = colLo_; j < colHi_; j++) {
                    left_[i][j] += right_[i][j];
                }
            }
            return left_;
        }
        int rowMid = rowLo_ + ((rowHi_ - rowLo_) / 2); 
        int colMid = colLo_ + ((colHi_ - colLo_) / 2);
        SumArray topLeft = new SumArray(left_, right_, rowLo_, rowMid, colLo_, colMid);
            SumArray topRight = new SumArray(left_, right_, rowMid, rowHi_, colLo_, colMid);
            topLeft.fork()
            int[][] topRightSummed = topRight.compute();
            int[][] topLeftSummed = topLeft.join();
            // ???

I can similarly find the bottom left, and bottom right arrays, but how do I join these arrays while maintaining the performance of parallelism? Should I be using shared memory?

Answer 1

Before throwing threads at this problem, optimize the use of a single core. CPU cache misses make a measurable difference in cases like this. For example consider this example code, in one case it sums up values array[i][j] and in the other array[j][i]. One of them suffers a lot less CPU cache misses, and is thus much faster than the other. The following code can be used to demonstrate that behavior.

public class Sum2D {

    public static void main( String[] args ) {
         int[][] data = createGrid(100);

        long sum = 0;
        long start1 = System.currentTimeMillis();
        for ( int i=0; i<100000; i++ ) {
            sum += sumAcrossFirst(data);
        }

        long end1 = System.currentTimeMillis();

        long start2 = System.currentTimeMillis();
        for ( int i=0; i<100000; i++ ) {
            sum += sumAcrossSecond(data);
        }

        long end2 = System.currentTimeMillis();

        double duration1 = (end1-start1)/1000.0;
        double duration2 = (end2-start2)/1000.0;
        System.out.println("duration1 = " + duration1);
        System.out.println("duration2 = " + duration2);
        System.out.println("sum = " + sum);
    }

    private static int[][] createGrid(int size) {
        int[][] data = new int[size][size];

        for ( int x=0; x<size; x++ ) {
            for ( int y=0; y<size; y++ ) {
                data[x][y] = 1;
            }
        }

        return data;
    }

    private static long sumAcrossFirst(int[][] data) {
        long sum = 0;

        int size = data.length;
        for ( int x=0; x<size; x++ ) {
            for ( int y=0; y<size; y++ ) {
                sum += data[x][y];
            }
        }

        return sum;
    }

    private static long sumAcrossSecond(int[][] data) {
        long sum = 0;

        int size = data.length;
        for ( int x=0; x<size; x++ ) {
            for ( int y=0; y<size; y++ ) {
                sum += data[y][x];
            }
        }

        return sum;
    }


}

Another optimisation is to reduce int[][] to int[], that will involve less pointer chasing and modern CPU prefetchers will kick in and keep the next part of the array in its cache for you.

For going parallel, you have to consider the same cache behavior AND recognize that using multiple threads has over heads. Thus smaller arrays will sum faster on a single thread. The threshold for this is best measured as it varies by CPU, but in general it will be somewhere around a 1000 or more. That said, I usually wait for the input data to pass a million cells before I worry about the extra complexity. Summing across arrays is fast.

The fastest way to sum up the arrays is to use SIMD instructions, unfortunately they are not available directly in Java without using JNI or something similar. Fork/Join does an admirable job, but it has some overheads before it gets up to speed. Which means the threshold of how many ints are required to break even between parallel and single core will be higher.

Having multiple threads write into the same, single array makes sense. Just be aware that writing to it from multiple CPU cores can cause cache invalidations between the cores which can cause thrashing if you have two separate cores accessing the same memory page.

So to get things started, here is an approach that you are free to hack around. It demonstrates using a Java Executor; which is the thread pool that sits underneath the Fork/Join framework.

private static Executor pool = Executors.newFixedThreadPool( Runtime.getRuntime().availableProcessors() );

private static int[][] sumParallel( int[][] a, int[][] b ) throws InterruptedException {
    int[][] result = createGrid(a.length);
    CountDownLatch latch = new CountDownLatch(a.length);

    for ( int i=0; i<a.length; i++ ) {
        pool.execute( new SumTask(latch, a,b,i, result) );
    }

    latch.await();

    return result;
}

public static class SumTask implements Runnable {
    private CountDownLatch latch;

    private int[][] a;
    private int[][] b;
    private int     row;
    private int[][] result;

    public SumTask(CountDownLatch latch, int[][] a, int[][] b, int row, int[][] result) {
        this.latch = latch;

        this.a = a;
        this.b = b;
        this.row = row;
        this.result = result;
    }

    public void run() {
        for ( int y=0; y<a.length; y++ ) {
            result[row][y] = a[row][y] + b[row][y];
        }

        latch.countDown();
    }
}

and for a bit more fun, here is a ForkJoin equivalent:

public class Sum2DFJ {

    public static void main( String[] args ) throws ExecutionException, InterruptedException {
        int[][] data = {{1,2,3},{1,2,3},{1,2,3}};

        SumTask task = new SumTask(data, data);
        ForkJoinPool pool = new ForkJoinPool();


        pool.execute(task);

        int[][] result = task.get();

        for ( int x=0; x<data.length; x++ ) {
            for ( int y=0; y<data.length; y++ ) {
                System.out.println("result[x][y] = " + result[x][y]);
            }
        }
    }

}


@SuppressWarnings("unchecked")
class SumTask extends RecursiveTask<int[][]> {

    private int[][] a;
    private int[][] b;

    public SumTask( int[][] a, int[][] b ) {

        this.a = a;
        this.b = b;
    }

    protected int[][] compute() {
        int[][] result = createGrid(a.length);

        List<ForkJoinTask> children = new ArrayList();

        for ( int i=0; i<a.length; i++ ) {
            children.add( new SumChildTask(a,b,i, result) );
        }

        invokeAll(children);

        return result;
    }

    private static int[][] createGrid(int size) {
        int[][] data = new int[size][size];

        for ( int x=0; x<size; x++ ) {
            for ( int y=0; y<size; y++ ) {
                data[x][y] = 0;
            }
        }

        return data;
    }
}

class SumChildTask extends RecursiveAction {


    private int[][] a;
    private int[][] b;
    private int row;
    private int[][] result;

    public SumChildTask(int[][] a, int[][] b, int row, int[][] result) {
        this.a = a;
        this.b = b;
        this.row = row;
        this.result = result;
    }

    protected void compute() {
        for ( int i=0; i<b.length; i++ ) {
            result[row][i] = a[row][i] + b[row][i];
        }
    }
}

Answer 2

Break the input into segments. When you get to the bottom:

compute() method when at threshold

int[][] A = original A matrix int[][] B = original B matrix int[][] C = new instantiated result matrix

int start = starting position int end = ending position

// column size is equal in all int columns = A[0].length;

// do all the rows in A and B for this segment for (int i = start; i < end; i++) {

// columns for A and C saves a subscript
int[] aSide = A[i]; 
int[] bSide = B[i]; 
int[] cSide = C[i]; 

// do all the columns in both
for (int j = 0; j < columns; j++) {

    // C(i,j) = A(i, j) + B(i, j)
    cSide[j] = aSide[j] + bSide[j];        
}

}
}

I use windows and firefox. The code insert doesn't work well so the above is not formatted correctly. Perhaps the monitor will fix it.

Summing each element of two arrays in parallel

Question

2 answers

solution1
4 2014-06-30 09:01:00

solution2
0 2014-06-30 17:57:26

Summing each element of two arrays in parallel

Question

2 answers

solution1 4 2014-06-30 09:01:00

solution2 0 2014-06-30 17:57:26

solution1
4 2014-06-30 09:01:00

solution2
0 2014-06-30 17:57:26