简体   繁体   中英

Scala parallel collection foreach return different results

Why by adding a println statement in the foreach function is changing results?

var sum = 0
val list = (1 to 100).toList.par
 list.tasksupport = 
   new ForkJoinTaskSupport(new scala.concurrent.forkjoin.ForkJoinPool(4))
 list.foreach ((x: Int) => { println (x,sum); sum += x})
 //5050
 println (sum)
 sum = 0
 list.foreach ((x: Int) => sum += x)
 //results vary
 println (sum)

Thats a race condition, since List is a parallel Collection foreach will run in parallel and mutate the un-synchronised variable sum.

Now why it is printing the right result in the first foreach? Because of println inside the block, remove it and you will encounter data race.

println delegates to PrintStream.println which has a synchronized block inside.

 public void println(Object x) {
    String s = String.valueOf(x);
    synchronized (this) {
        print(s);
        newLine();
    }
}

Btw, thats not a good way for parallelising sum.

Scala encourages immutability over mutability specifically because things like this happen. When you have val variables, which can be changed, you can create race conditions due to changing values in memory that or may not already have been read by another thread that doesn't realize the change.

Doing sum in parallel like this causes the following to happen: All threads being to call the function * 3 threads read the value sum as 0, * 1 thread writes sum + x , which happens to be 34 , because its parallel, the addition happens in any order * 1 more thread writes sum + x , which it computes as 0 + 17 (assuming * it was 17) because it read the value 0 before it was written to memory * 2 more threads read 17 * the last of the first three threads writes 0 + 9 , because it had read 0.

TLDR, the reads and writes to memory get out of sync because several threads may read while other are writing, and overwrite each others changes.

The solution is to find a way to do this in sequence, or leverage paralelization in a non destructive way. Functions like sum should be done in sequence, or in ways that always generate new values, for example, foldLeft:

Seq(1, 2, 3, 4).foldLeft(0){case (sum, newVal) => sum + newVal}

Or you could write a funciton that creates subsets of sums, adds them in paralel, and then adds all of those together in sequence:

Seq(1, 2, 3, 4, 5, 6, 7, 8).grouped(2).toSeq.par.map {
  pair =>
   pair.foldLeft(0){case (sum, newVal) => sum + newVal}
}.seq.foldLeft(0){case (sum, newVal) => sum + newVal}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM