使用parallelstream（）在Java 8中填充Map是否安全

Question

I have a list of 1 million objects, and I need to populate that into a Map. 我有一个包含100万个对象的列表，我需要将其填充到Map中。 Now, I want to reduce the time for populating this into a Map, and for this I am planning on using Java 8 parallelstream() like this: 现在，我想减少将其填充到Map中的时间，为此我计划使用Java 8 parallelstream（），如下所示：

List<Person> list = new LinkedList<>();
Map<String, String> map = new HashMap<>();
list.parallelStream().forEach(person ->{
    map.put(person.getName(), person.getAge());
});

I want to ask is it safe to populate a Map like this through parallel threads. 我想问一下，通过并行线程填充这样的Map是否安全。 Isn't it possible to have concurrency issues, and some data may get lost in the Map ? 难道不可能出现并发问题，并且某些数据可能会在Map中丢失吗？

Answer 1

It is very safe to use parallelStream() to collect into a HashMap . 使用parallelStream()来收集 HashMap是非常安全的。 However, it is not safe to use parallelStream() , forEach and a consumer adding things to a HashMap . 但是，使用parallelStream() ， forEach和消费者向HashMap添加内容是不安全的。

HashMap is not a synchronized class, and trying to put elements in it concurrently will not work properly. HashMap不是同步类，并且尝试同时将元素放入其中将无法正常工作。 This is what forEach will do, it will invoke the given consumer, which puts elements into the HashMap , from multiple threads, possibly at the same time. 这就是forEach将要做的事情，它将调用给定的使用者，它可以同时从多个线程将元素放入HashMap 。 If you want a simple code demonstrating the issue: 如果你想要一个简单的代码来证明这个问题：

List<Integer> list = IntStream.range(0, 10000).boxed().collect(Collectors.toList());
Map<Integer, Integer> map = new HashMap<>();
list.parallelStream().forEach(i -> {
    map.put(i, i);
});
System.out.println(list.size());
System.out.println(map.size());

Make sure to run it a couple of times. 一定要运行几次。 There's a very good chance (the joy of concurrency) that the printed map size after the operation is not 10000, which is the size of the list, but slightly less. 操作后打印的地图大小不是10000，这是列表的大小，但稍微少一点，这是一个非常好的机会（并发的乐趣）。

The solution here, as always, is not to use forEach , but to use a mutable reduction approach with the collect method and the built-in toMap : 这里的解决方案一如既往不是使用forEach ，而是使用collect方法和内置toMap的可变缩减方法：

Map<Integer, Integer> map = list.parallelStream().collect(Collectors.toMap(i -> i, i -> i));

Use that line of code in the sample code above, and you can rest assured that the map size will always be 10000. The Stream API ensures that it is safe to collect into a non-thread safe container, even in parallel. 使用在上面的示例代码行的代码，你可以放心，地图大小将始终是10000的流API确保它是安全的，收集到非线程安全的容器，即使是在平行。 Which also means that you don't need to use toConcurrentMap to be safe, this collector is needed if you specifically want a ConcurrentMap as result, not a general Map ; 这也意味着您不需要使用toConcurrentMap是安全的，如果您特别想要ConcurrentMap作为结果，而不是一般Map ，则需要此收集器; but as far as thread safety is concerned with regard to collect , you can use both. 但就线程安全而言，关于collect ，你可以使用两者。

Answer 2

HashMap isn't threadsafe, but ConcurrentHashMap is; HashMap不是线程安全的，但是ConcurrentHashMap是; use that instead 用它代替

Map<String, String> map = new ConcurrentHashMap<>();

and your code will work as expected. 并且您的代码将按预期工作。

Performance comparison of `forEach()` vs `toMap()` `forEach()`与`toMap()`性能比较

After JVM warm-up, with 1M elements, using parallel streams and using median timings, the forEach() version was consistently 2-3 times faster than the toMap() version. 在JVM预热后，使用1M元素，使用并行流和使用中值时序， forEach()版本始终比toMap()版本快2-3倍。

Results were consistent between all-unique, 25% duplicate and 100% duplicate inputs. 结果在所有独特的，25％重复和100％重复输入之间是一致的。

使用parallelstream（）在Java 8中填充Map是否安全

问题描述

2 个解决方案

解决方案1
18 已采纳 2016-10-25 11:19:37

解决方案2
3 2016-10-25 11:58:12

Performance comparison of `forEach()` vs `toMap()` `forEach()`与`toMap()`性能比较

使用parallelstream（）在Java 8中填充Map是否安全

问题描述

2 个解决方案

解决方案1 18 已采纳 2016-10-25 11:19:37

解决方案2 3 2016-10-25 11:58:12

Performance comparison of forEach() vs toMap() forEach()与toMap()性能比较

解决方案1
18 已采纳 2016-10-25 11:19:37

解决方案2
3 2016-10-25 11:58:12

Performance comparison of `forEach()` vs `toMap()` `forEach()`与`toMap()`性能比较