简体   繁体   中英

Java parallel stream not working as expected

I have the following code:

Map<String, Person> targetPerson = targetPersonList
                                     .stream()
                                     .collect(toMap(Person::getKey,  Function.identity()));

where targetPersonList is quite a large list , and the above code takes like 38 minutes to complete. So I thought the following code should speed it up a little bit

Map<String, Person> targetPerson = targetPersonList
                                     .parallelStream()
                                     .collect(toMap(Person::getKey, Function.identity()));

It's actually the opposite, the 'parallel' piece , takes 1 hour and 20 minutes. I have a Core i7 8th generation, which should have 6 cores and 12 threads, what is the problem then? Is there something fundamentally wrong with my understanding of parallel streams ?

Needing 38 minutes just to fill a HashMap is an unusual long time. It suggests that either, Person::getKey is performing an expensive construction or the result is an object with a less than optimal hashCode or equals implementation.

On my machine, filling a map with ten million elements with a reasonable hashCode or equals implementation takes less than a second, hundred millions still only need a few seconds and then, the memory consumption becomes a problem.

That said, the worse performance of the parallel stream doesn't come at a surprise. As discussed in “ Should I always use a parallel stream when possible? ”, the parallel processing has some fixed overhead and you need some significant (per element) workload to have a benefit greater than the overhead.

In your specific example, there's no benefit at all.

The parallel collect operation works by splitting the stream elements into chunks, to be processed by different worker threads. Each of them will create a new local container, in case of toMap a map of the same type as the end result, then, each thread will accumulate elements into its local container, ie put values into the map, and when two worker threads have finished their work, the partial results will be merged, which implies putting all elements of one map into the other.

Since you have no filtering operation and the absence of a merge function implies that all keys are unique, it's easy to conclude that in the best case you have two worker threads filling two maps of the same size perfectly in parallel, followed by putting one of these maps into the other, taking as much time as has been saved by the previous parallel processing.

Your example also doesn't include potentially expensive intermediate operations, so only if Person::getKey is expensive, it's costs could be reduced by the parallel processing.

As discussed in this answer , using toConcurrentMap instead of toMap can improve such a scenario, as it allows to skip the merge operation and having all-unique keys implies very low contention when all worker threads put into one map.

However, it's worth investigating the actual cause for the performance problem. When the issue is the hashCode or equals implementation of the key object, fixing it will gain far more. Also, concurrency can't solve problems related to an almost full heap.

Finally, toConcurrentMap returns a concurrent map, which may impose higher costs on subsequent processing, even when you don't intent to use this map with multiple threads.

It's beneficial to use a parallel stream if you have a heavy operator. For example, a long-executing map function. In your case, it's better not to use streams as well, because it will only slow it down.

However, I have a set of advice.

  1. Since you are probably want to use HashMap, make sure you implement and cache the result of the hashCode() function.
  2. Initialize your map with constructor passing initial capacity new HashMap<>(targetPersonList.size());
  3. Use for-loop and insert every element if all of the values are pre-calculated.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM