简体   繁体   中英

Is it safe to use parallelstream() to populate a Map in Java 8

I have a list of 1 million objects, and I need to populate that into a Map. Now, I want to reduce the time for populating this into a Map, and for this I am planning on using Java 8 parallelstream() like this:

List<Person> list = new LinkedList<>();
Map<String, String> map = new HashMap<>();
list.parallelStream().forEach(person ->{
    map.put(person.getName(), person.getAge());
});

I want to ask is it safe to populate a Map like this through parallel threads. Isn't it possible to have concurrency issues, and some data may get lost in the Map ?

It is very safe to use parallelStream() to collect into a HashMap . However, it is not safe to use parallelStream() , forEach and a consumer adding things to a HashMap .

HashMap is not a synchronized class, and trying to put elements in it concurrently will not work properly. This is what forEach will do, it will invoke the given consumer, which puts elements into the HashMap , from multiple threads, possibly at the same time. If you want a simple code demonstrating the issue:

List<Integer> list = IntStream.range(0, 10000).boxed().collect(Collectors.toList());
Map<Integer, Integer> map = new HashMap<>();
list.parallelStream().forEach(i -> {
    map.put(i, i);
});
System.out.println(list.size());
System.out.println(map.size());

Make sure to run it a couple of times. There's a very good chance (the joy of concurrency) that the printed map size after the operation is not 10000, which is the size of the list, but slightly less.

The solution here, as always, is not to use forEach , but to use a mutable reduction approach with the collect method and the built-in toMap :

Map<Integer, Integer> map = list.parallelStream().collect(Collectors.toMap(i -> i, i -> i));

Use that line of code in the sample code above, and you can rest assured that the map size will always be 10000. The Stream API ensures that it is safe to collect into a non-thread safe container, even in parallel. Which also means that you don't need to use toConcurrentMap to be safe, this collector is needed if you specifically want a ConcurrentMap as result, not a general Map ; but as far as thread safety is concerned with regard to collect , you can use both.

HashMap isn't threadsafe, but ConcurrentHashMap is; use that instead

Map<String, String> map = new ConcurrentHashMap<>();

and your code will work as expected.


Performance comparison of forEach() vs toMap()

After JVM warm-up, with 1M elements, using parallel streams and using median timings, the forEach() version was consistently 2-3 times faster than the toMap() version.

Results were consistent between all-unique, 25% duplicate and 100% duplicate inputs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM