简体   繁体   English

Java 8中parallelStream的变异减少

[英]Mutation reductions for parallelStream in Java 8

Joshua Bloch in <Effective Java> (Third Edition) mentions that <Effective Java> (Third Edition)中的Joshua Bloch提到了这一点

The operations performed by Stream's collect method, which are known as mutable reductions , are not good candidates for parallelism because the overhead of combining collections is costly. Stream的collect方法执行的操作(称为可变约简 )不是并行性的良好候选者,因为组合集合的开销很昂贵。

I read the docs on Mutable reduction , but I am still not quite sure why reduction is not a good candidate for parallelism. 我阅读了有关Mutable减少的文档,但我仍然不太清楚为什么减少不是并行性的好选择。 Is it the synchronization ? synchronization吗?

As @Ravindra Ranwala points out (I also saw this on the Reduction, concurrency, and ordering docs): 正如@Ravindra Ranwala指出的那样(我还在减少,并发和订购文档上看到了这一点):

It may actually be counterproductive to perform the operation in parallel. 实际上并行地执行操作可能适得其反。 This is because the combining step (merging one Map into another by key) can be expensive for some Map implementations. 这是因为组合步骤(通过键将一个Map合并到另一个Map)对于某些Map实现来说可能是昂贵的。

If so, then are there other major factors we need to care about that might result in low performance? 如果是这样,那么我们需要关注的其他主要因素可能导致性能低下吗?

No it's nothing to do with the synchronization . 不,它与synchronization 无关 Consider you have a 1 million Person objects and need to find out all people who live in New York. 考虑你有100万个Person对象和需要找出所有people谁住在纽约。 So a typical stream pipeline would be, 那么典型的流管道就是,

people.parallelStream()
    .filter(p -> p.getState().equals("NY"))
    .collect(Collectors.toList());

Consider a parallel execution of this query. 考虑并行执行此查询。 Let's say we have 10 threads executing it in parallel. 假设我们有10个线程并行执行它。 Each thread will accumulate it's own data set into a separate local container. 每个线程都会将自己的数据集累积到一个单独的本地容器中。 Finally the 10 result containers are merged to form one large container. 最后,合并10个结果容器以形成一个大容器。 This merge will be costly and is an additional step introduced by the parallel execution. 这种合并将是昂贵的,并且是并行执行引入的额外步骤。 Hence parallel execution may not always be faster. 因此,并行执行可能并不总是更快。 Some times sequential execution may be faster than it's parallel counter part. 有时,顺序执行可能比并行计数器部分更快。

So always start with a sequential execution. 所以总是从顺序执行开始。 If that makes sense only, you may fall back to it's parallel counterpart at some later point in time. 如果这只是有意义的话,你可能会在稍后的某个时间点回到它的并行对应物。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM