Guaranteeing the visibility of side effects of concurrent tasks in the Java Execution Framework

Question

I am experimenting with techniques for ensuring the visibility of side effects accomplished by concurrent tasks executed using the Java Executor framework. As a simple scenario, considere an hypothetic problem of matrix multiplication.

Let's say that the matrices to multiply could be considerably large (eg, few thousands rows and columns) and that to speed up the multiplication of such matrices I implement a concurrent algorithm where the calculation of each cell in the result matrix is considered as an independent (ie, parallelizable) task. To simplify a bit, let's ignore that for small input matrices this parallelization may be not such a good idea.

So considere below the first version of my program:

public class MatrixMultiplier {

    private final int[][] m;
    private final int[][] n;
    private volatile int[][] result; //the (lazily computed) result of the matrix multiplication

    private final int numberOfMRows; //number of rows in M
    private final int numberOfNColumns; //number of columns in N
    private final int commonMatrixDimension; //number of columns in M and rows in N

    public MatrixMultiplier(int[][] m, int[][] n) {
        if(m[0].length != n.length)
            throw new IllegalArgumentException("Uncompatible arguments: " + Arrays.toString(m) + " and " + Arrays.toString(n));
        this.m = m;
        this.n = n;
        this.numberOfMRows = m.length;
        this.numberOfNColumns = n[0].length;
        this.commonMatrixDimension = n.length;
    }

    public synchronized int[][] multiply() {
        if (result == null) {
            result = new int[numberOfMRows][numberOfNColumns];

            ExecutorService executor = createExecutor();

            Collection<Callable<Void>> tasks = new ArrayList<>();
            for (int i = 0; i < numberOfMRows; i++) {
                final int finalI = i;
                for (int j = 0; j < numberOfNColumns; j++) {
                    final int finalJ = j;
                    tasks.add(new Callable<Void>() {
                        @Override
                        public Void call() throws Exception {
                            calculateCell(finalI, finalJ);
                            return null;
                        }
                    });
                }
            }

            try {
                executor.invokeAll(tasks);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            } finally {
                executor.shutdownNow();
            }

        }

        return result;
    }

    private ExecutorService createExecutor() {
        final int availableProcessors = Runtime.getRuntime().availableProcessors();
        final int processorsBound = availableProcessors + 1;
        final int maxConcurrency = numberOfMRows * numberOfNColumns;
        final int threadPoolSize = maxConcurrency < processorsBound ? maxConcurrency : processorsBound;
        return Executors.newFixedThreadPool(threadPoolSize);
    }

    private void calculateCell(int mRow, int nColumn) {
        int sum = 0;
        for (int k = 0; k < commonMatrixDimension; k++) {
            sum += m[mRow][k] * n[k][nColumn];
        }
        result[mRow][nColumn] = sum;
    }

}

As far as I understand there is a problem with this implementation: some modifications to the result matrix by the executed tasks may not be necessarily visible to the thread invoking multiply() .

Assuming the previous is correct, consider the alternative implementation of multiply() relying on explicit locks (the new lock related code is commented with //<LRC> ):

    public synchronized int[][] multiply() {
        if (result == null) {
            result = new int[numberOfMRows][numberOfNColumns];

            final Lock[][] locks = new Lock[numberOfMRows][numberOfNColumns]; //<LRC>
            for (int i = 0; i < numberOfMRows; i++) { //<LRC>
                for (int j = 0; j < numberOfNColumns; j++) { //<LRC>
                    locks[i][j] = new ReentrantLock(); //<LRC>
                } //<LRC>
            } //<LRC>

            ExecutorService executor = createExecutor();

            Collection<Callable<Void>> tasks = new ArrayList<>();
            for (int i = 0; i < numberOfMRows; i++) {
                final int finalI = i;
                for (int j = 0; j < numberOfNColumns; j++) {
                    final int finalJ = j;
                    tasks.add(new Callable<Void>() {
                        @Override
                        public Void call() throws Exception {
                            try { //<LRC>
                                locks[finalI][finalJ].lock(); //<LRC>
                                calculateCell(finalI, finalJ);
                            } finally { //<LRC>
                                locks[finalI][finalJ].unlock(); //<LRC>
                            } //<LRC>
                            return null;
                        }
                    });
                }
            }

            try {
                executor.invokeAll(tasks);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            } finally {
                executor.shutdownNow();
            }

            for (int i = 0; i < numberOfMRows; i++) { //<LRC>
                for (int j = 0; j < numberOfNColumns; j++) { //<LRC>
                    locks[i][j].lock(); //<LRC>
                    locks[i][j].unlock(); //<LRC>
                } //<LRC>
            } //<LRC>
        }

        return result;
    }

The usage of explicit locks above has as unique goal to ensure the publication of the changes to the invoking thread, since there is no any possibility of contention.

My main question is if this is a valid solution to the problem of publishing side effects in my scenario.

As a secondary question: is there a more efficient/elegant way to solve this problem ? Please note that I am not looking for alternative algorithm implementations (eg, the Strassen's algorithm) for parallelizing matrix multiplication, since mine is just a simple case study. I am rather interested on alternatives for ensuring the visibility of changes in an algorithm like the one presented here.

UPDATE

I think the alternative implementation below improves on the previous implementation. It makes use of one single internal lock without affecting much the concurrency:

public class MatrixMultiplier {
    ...
    private final Object internalLock = new Object();

    public synchronized int[][] multiply() {
        if (result == null) {
            result = new int[numberOfMRows][numberOfNColumns];

            ExecutorService executor = createExecutor();

            Collection<Callable<Void>> tasks = new ArrayList<>();
            for (int i = 0; i < numberOfMRows; i++) {
                final int finalI = i;
                for (int j = 0; j < numberOfNColumns; j++) {
                    final int finalJ = j;
                    tasks.add(new Callable<Void>() {
                        @Override
                        public Void call() throws Exception {
                            calculateCell(finalI, finalJ);
                            synchronized (internalLock){}
                            return null;
                        }
                    });
                }
            }

            try {
                executor.invokeAll(tasks);
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
            } finally {
                executor.shutdownNow();
            }

        }

        synchronized (internalLock){}

        return result;
    }
    ...
}

This alternative is just more efficient but both it and the previous implementation which makes use of many locks look correct to me. Are all my observations correct ? Is there a more efficient/elegant way to deal with the synchronization problem in my scenario?

Answer 1

Declaring result as volatile only ensures that changing the reference of result (ie result = ...; operations) is visible to everyone.

The most obvious way to resolve this is to eliminate the side effect. In this case this is easy: just make calculateCell() and the Callable invoking it return the value and let the main thread write the values into the array.

You could of course do explicit locking, like you did in your second example but it seems an overkill to use nxm locks when you could use just one lock. Of course one lock would kill the parallelism in your example, so once again the solution is to make calculateCell() return the value and only lock for the duration of writing the result in the result array.

Or indeed you can use ForkJoin and forget about the whole thing because it will do it for you.

Guaranteeing the visibility of side effects of concurrent tasks in the Java Execution Framework

Question

1 answers

solution1
0 2015-02-10 19:45:36

Guaranteeing the visibility of side effects of concurrent tasks in the Java Execution Framework

Question

1 answers

solution1 0 2015-02-10 19:45:36

solution1
0 2015-02-10 19:45:36