简体   繁体   中英

How Do I Perform Matrix Calculation With Stream Parallel Java?

I'm trying to make a matrix arithmetic operation method using multidimensional arrays ([verybigrow][2]). I'm new at this, and I just can't find what I'm doing wrong. I'd really appreciate any help in telling me what it is.

    try {
        Stream<String> Matrix = Files.lines(Paths.get(file)).parallel();
        String[][] DataSet = Matrix.map(mapping -> mapping.split(",")).toArray(String[][]::new);
        Double[][] distanceTable = new Double[DataSet.length - 1][];

        /* START WANT TO REPLACE THIS MATRIX CALCULATION WITH PARALLEL STREAM RATHER THAN USE TRADITIONAL ARRAY ARITHMETICS START  */

        for (int i = 0; i < distanceTable.length - 1; ++i) {
            distanceTable[i] = new Double[i + 1];
            for (int j = 0; j <= i; ++j) {
                double distance = 0.0;
                for (int k = 0; k < DataSet[i + 1].length; ++k) {
                    double difference = Double.parseDouble(DataSet[j][k]) - Double.parseDouble(DataSet[i + 1][k]);
                    distance += difference * difference;
                }
                distanceTable[i][j] = distance;
            }
        }

        /* END WANT TO REPLACE THIS MATRIX CALCULATION WITH PARALLEL STREAM RATHER THAN USE TRADITIONAL ARRAY ARITHMETICS START  */

        } catch ( Exception except ){
            System.out.println ( except );
        }

I had rather not use libraries or anything like that, I'm mostly doing this to learn how it works. Thank you so much in advance. if you asking the data looks like:

4,53
5,63
10,59
9,77
13,49

The Output of data processing should look like this:

[101] <- ((4-5)^2) + ((53-63)^2)
[72, 41] <- ( ((4-10)^2) + ((53-59)^2) ), ( ((5,10)^2) + ((63-59)^2))
[601.0, 212.0, 325.0]
[97.0, 260.0, 109.0, 800.0]
[337.0, 100.0, 109.0, 80.0, 400.0]

I try to change matrixDistance with distanceTable . Try to move this code into different method so you can run it parallel

        for(int i = 0; i < matrixDistance.length - 1; ++i) {
            distanceTable[i] = new double[i + 1];
            for(int j = 0; j <= i; ++j) {
                double distance = 0.0;
                for(int k = 0; k < DataSet[i+1].length; ++k) {
                    double difference = Double.parseDouble(DataSet[j][k]) - Double.parseDouble(DataSet[i+1][k]);
                    distance += difference * difference;
                }
                distanceTable[i][j] = distance;
            }
        }

I've created this example based on your question.

    public void parallel(String file)
    ....
    // parsing from csv into matrix 2d Double[][]
    ....
        IntStream
            .range(1, data.length - 1)
            .parallel()
            .forEach(i -> {
                add(euclidian.euclidian(Arrays.copyOf(data, i+1)), i);
            });
}

This is the mini version of your algorithm.

    public Double[] euclidian(Double[][] data) {
        Double[] result = new Double[data.length - 1];
        for (int i = 0; i < result.length; i++) {
            result[i] =
                    Math.pow(data[i][0] - data[data.length - 1][0], 2) +
                            Math.pow(data[i][1] - data[data.length - 1][1], 2);
        }

        return result;
    }

And because of parallel execution, you need to add locking method for insert data into distanceTable.

    private final Object lock = new Object();
    Double[][] distanceTable;

    void add(Double[] data, int index){
        synchronized (lock) {
            distanceTable[index - 1] = data;
        }
    }

I've tested it in my laptop, for 74 row in csv file the comparison is like this (ORI is using your code, PAR is using my approach):

java -jar target/stream-example-1.0-SNAPSHOT.jar test.csv 
#####################
ORI read: 59 ms
ORI  map: 71 ms
ORI time: 80 ms
#####################
PAR read: 0 ms
PAR  map: 6 ms
PAR time: 11 ms

Hope it helps.

@Fahim Bagar answer example should run faster with big data sets, but you should improve your single thread code before making hasty decisions about timing metrics compared to parallel.

For example, removing wasteful Double.parseDouble is easy with code example provided by @Fahim Bagar swapping String[][] DataSet by Double[][] DataSet

//String[][] DataSet = Matrix.map(mapping -> mapping.split(",")).toArray(String[][]::new);
Double[][] DataSet = Matrix.map(row -> Arrays.stream(row.split(",")).map(Double::parseDouble).toArray(Double[]::new)).toArray(Double[][]::new);

Then take various array references for DataSet[i + 1] and DataSet[j] to local variables outside their loops:

for (int i = 0; i < distanceTable.length - 1; ++i) {
    Double[] arriplus1 = new Double[i + 1];
    Double[] iarr = DataSet[i + 1];
    for (int j = 0; j <= i; ++j) {
        double distance = 0.0;
        Double[] jarr = DataSet[j];
        for (int k = 0, sz = iarr.length; k < sz; ++k) {
            double difference = jarr[k] - iarr[k];
            distance += difference * difference;
        }
        arriplus1[j] = distance;
    }
    distanceTable[i] = arriplus1;
}

You can do same for @Fahim Bagar euclidian method

public Double[] euclidian(Double[][] data) {
    Double[] result = new Double[data.length - 1];
    Double[] dL1 = data[data.length - 1];
    for (int i = 0; i < result.length; i++) {
        Double[] di = data[i];
        result[i] = Math.pow(di[0] - dL1[0], 2) + Math.pow(di[1] - dL1[1], 2);
    }
    return result;
}

After that, getting rid of Double and using double would speed up further / cut down on memory allocations.

On CSV rows 1048 I see these timings on the 10th run of each:

#####################
ORI read: 0 ms
ORI  map: 4 ms
ORI time: 14 ms
#####################
PAR read: 0 ms
PAR  map: 1 ms
PAR time: 10 ms

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM