简体   繁体   English

使用parallelstream()在Java 8中填充Map是否安全

[英]Is it safe to use parallelstream() to populate a Map in Java 8

I have a list of 1 million objects, and I need to populate that into a Map. 我有一个包含100万个对象的列表,我需要将其填充到Map中。 Now, I want to reduce the time for populating this into a Map, and for this I am planning on using Java 8 parallelstream() like this: 现在,我想减少将其填充到Map中的时间,为此我计划使用Java 8 parallelstream(),如下所示:

List<Person> list = new LinkedList<>();
Map<String, String> map = new HashMap<>();
list.parallelStream().forEach(person ->{
    map.put(person.getName(), person.getAge());
});

I want to ask is it safe to populate a Map like this through parallel threads. 我想问一下,通过并行线程填充这样的Map是否安全。 Isn't it possible to have concurrency issues, and some data may get lost in the Map ? 难道不可能出现并发问题,并且某些数据可能会在Map中丢失吗?

It is very safe to use parallelStream() to collect into a HashMap . 使用parallelStream()收集 HashMap是非常安全的。 However, it is not safe to use parallelStream() , forEach and a consumer adding things to a HashMap . 但是,使用parallelStream()forEach和消费者向HashMap添加内容是不安全的。

HashMap is not a synchronized class, and trying to put elements in it concurrently will not work properly. HashMap不是同步类,并且尝试同时将元素放入其中将无法正常工作。 This is what forEach will do, it will invoke the given consumer, which puts elements into the HashMap , from multiple threads, possibly at the same time. 这就是forEach将要做的事情,它将调用给定的使用者,它可以同时从多个线程将元素放入HashMap If you want a simple code demonstrating the issue: 如果你想要一个简单的代码来证明这个问题:

List<Integer> list = IntStream.range(0, 10000).boxed().collect(Collectors.toList());
Map<Integer, Integer> map = new HashMap<>();
list.parallelStream().forEach(i -> {
    map.put(i, i);
});
System.out.println(list.size());
System.out.println(map.size());

Make sure to run it a couple of times. 一定要运行几次。 There's a very good chance (the joy of concurrency) that the printed map size after the operation is not 10000, which is the size of the list, but slightly less. 操作后打印的地图大小不是10000,这是列表的大小,但稍微少一点,这是一个非常好的机会(并发的乐趣)。

The solution here, as always, is not to use forEach , but to use a mutable reduction approach with the collect method and the built-in toMap : 这里的解决方案一如既往不是使用forEach ,而是使用collect方法和内置toMap可变缩减方法:

Map<Integer, Integer> map = list.parallelStream().collect(Collectors.toMap(i -> i, i -> i));

Use that line of code in the sample code above, and you can rest assured that the map size will always be 10000. The Stream API ensures that it is safe to collect into a non-thread safe container, even in parallel. 使用在上面的示例代码行的代码,你可以放心,地图大小将始终是10000的流API确保它是安全的 ,收集到非线程安全的容器,即使是在平行。 Which also means that you don't need to use toConcurrentMap to be safe, this collector is needed if you specifically want a ConcurrentMap as result, not a general Map ; 这也意味着您不需要使用toConcurrentMap是安全的,如果您特别想要ConcurrentMap作为结果,而不是一般Map ,则需要此收集器; but as far as thread safety is concerned with regard to collect , you can use both. 但就线程安全而言,关于collect ,你可以使用两者。

HashMap isn't threadsafe, but ConcurrentHashMap is; HashMap不是线程安全的,但是ConcurrentHashMap是; use that instead 用它代替

Map<String, String> map = new ConcurrentHashMap<>();

and your code will work as expected. 并且您的代码将按预期工作。


Performance comparison of forEach() vs toMap() forEach()toMap()性能比较

After JVM warm-up, with 1M elements, using parallel streams and using median timings, the forEach() version was consistently 2-3 times faster than the toMap() version. 在JVM预热后,使用1M元素,使用并行流和使用中值时序, forEach()版本始终比toMap()版本快2-3倍。

Results were consistent between all-unique, 25% duplicate and 100% duplicate inputs. 结果在所有独特的,25%重复和100%重复输入之间是一致的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Java parallelStream映射错过了记录 - Java parallelStream map misses records Java ParallelStream:多个 map 或单个 map - Java ParallelStream: several map or single map 在ParallelStream中使用java isomicReference线程是否安全? - is java AtomicReference thread safe when used within parallelStream? 如何在 Java 和 logback 中将 MDC 与 parallelStream 一起使用 - How to use MDC with parallelStream in Java and logback 来自 Spring JPA 的 Iterable 结果的 Spliterators 是否可以安全地在 parallelStream 中使用 - Are Spliterators from Iterable results from Spring JPA safe to use in parallelStream Java parallelStream不使用预期的线程数 - Java parallelStream does not use expected number of threads 如何在Java parallelStream中使用print(“\\ r”+ progressMessage)? - How to use print(“\r”+progressMessage) in a Java parallelStream? Java 8中的Streams:在集群上使用parallelstream()的简单解决方案? - Streams in Java 8: simple solution to use parallelstream() on a cluster? Java stream parallelStream How to map multple functions to stream map - Java stream parallelStream How to map multple functions to stream map 使用谓词通过parallelStream筛选并发映射或列表是否更快? - Is it faster to use Predicates to filter a Concurrent Map or a List, using parallelStream?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM