[英]Is it safe to use parallelstream() to populate a Map in Java 8
I have a list of 1 million objects, and I need to populate that into a Map. 我有一个包含100万个对象的列表,我需要将其填充到Map中。 Now, I want to reduce the time for populating this into a Map, and for this I am planning on using Java 8 parallelstream() like this:
现在,我想减少将其填充到Map中的时间,为此我计划使用Java 8 parallelstream(),如下所示:
List<Person> list = new LinkedList<>();
Map<String, String> map = new HashMap<>();
list.parallelStream().forEach(person ->{
map.put(person.getName(), person.getAge());
});
I want to ask is it safe to populate a Map like this through parallel threads. 我想问一下,通过并行线程填充这样的Map是否安全。 Isn't it possible to have concurrency issues, and some data may get lost in the Map ?
难道不可能出现并发问题,并且某些数据可能会在Map中丢失吗?
It is very safe to use parallelStream()
to collect into a HashMap
. 使用
parallelStream()
来收集 HashMap
是非常安全的。 However, it is not safe to use parallelStream()
, forEach
and a consumer adding things to a HashMap
. 但是,使用
parallelStream()
, forEach
和消费者向HashMap
添加内容是不安全的。
HashMap
is not a synchronized class, and trying to put elements in it concurrently will not work properly. HashMap
不是同步类,并且尝试同时将元素放入其中将无法正常工作。 This is what forEach
will do, it will invoke the given consumer, which puts elements into the HashMap
, from multiple threads, possibly at the same time. 这就是
forEach
将要做的事情,它将调用给定的使用者,它可以同时从多个线程将元素放入HashMap
。 If you want a simple code demonstrating the issue: 如果你想要一个简单的代码来证明这个问题:
List<Integer> list = IntStream.range(0, 10000).boxed().collect(Collectors.toList());
Map<Integer, Integer> map = new HashMap<>();
list.parallelStream().forEach(i -> {
map.put(i, i);
});
System.out.println(list.size());
System.out.println(map.size());
Make sure to run it a couple of times. 一定要运行几次。 There's a very good chance (the joy of concurrency) that the printed map size after the operation is not 10000, which is the size of the list, but slightly less.
操作后打印的地图大小不是10000,这是列表的大小,但稍微少一点,这是一个非常好的机会(并发的乐趣)。
The solution here, as always, is not to use forEach
, but to use a mutable reduction approach with the collect
method and the built-in toMap
: 这里的解决方案一如既往不是使用
forEach
,而是使用collect
方法和内置toMap
的可变缩减方法:
Map<Integer, Integer> map = list.parallelStream().collect(Collectors.toMap(i -> i, i -> i));
Use that line of code in the sample code above, and you can rest assured that the map size will always be 10000. The Stream API ensures that it is safe to collect into a non-thread safe container, even in parallel. 使用在上面的示例代码行的代码,你可以放心,地图大小将始终是10000的流API确保它是安全的 ,收集到非线程安全的容器,即使是在平行。 Which also means that you don't need to use
toConcurrentMap
to be safe, this collector is needed if you specifically want a ConcurrentMap
as result, not a general Map
; 这也意味着您不需要使用
toConcurrentMap
是安全的,如果您特别想要ConcurrentMap
作为结果,而不是一般Map
,则需要此收集器; but as far as thread safety is concerned with regard to collect
, you can use both. 但就线程安全而言,关于
collect
,你可以使用两者。
HashMap
isn't threadsafe, but ConcurrentHashMap
is; HashMap
不是线程安全的,但是ConcurrentHashMap
是; use that instead 用它代替
Map<String, String> map = new ConcurrentHashMap<>();
and your code will work as expected. 并且您的代码将按预期工作。
forEach()
vs toMap()
forEach()
与toMap()
性能比较 After JVM warm-up, with 1M elements, using parallel streams and using median timings, the forEach()
version was consistently 2-3 times faster than the toMap()
version. 在JVM预热后,使用1M元素,使用并行流和使用中值时序,
forEach()
版本始终比toMap()
版本快2-3倍。
Results were consistent between all-unique, 25% duplicate and 100% duplicate inputs. 结果在所有独特的,25%重复和100%重复输入之间是一致的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.