简体   繁体   中英

When to use Collectors.groupingByConcurrent?

I'm failing to understand the exact use case for Collectors.groupingByConcurrent . From the JavaDocs:

Returns a concurrent Collector implementing a cascaded "group by" operation on input elements of type T...
This is a concurrent and unordered Collector.
...

Maybe the keywords here are cascaded "group by" . Does that point to something in how the actual accumulation is done by the collector? (looking at the source, it got intricate very quickly)


When I test it with a fake ConcurrentMap

class FakeConcurrentMap<K, V> extends HashMap<K, V> 
    implements ConcurrentMap<K, V> {}

I see that it breaks (gives wrong aggregations as the map isn't thread-safe) with parallel streams:

Map<Integer, Long> counts4 = IntStream.range(0, 1000000)
        .boxed()
        .parallel()
        .collect(
            Collectors.groupingByConcurrent(i -> i % 10, 
                                          FakeConcurrentMap::new, 
                                          Collectors.counting()));

Without .parallel() , results are consistently correct. So it seems that groupingByConcurrent goes with parallel streams.

But, as far as I can see, the following parallel stream collected with groupingBy always produces correct results:

Map<Integer, Long> counts3 = IntStream.range(0, 1000000)
        .boxed()
        .parallel()
        .collect(
            Collectors.groupingBy(i -> i % 10, 
                                  HashMap::new,
                                  Collectors.counting()));

So when is it correct to use groupingByConcurrent instead of groupingBy (surely that can't be just to get groupings as a concurrent map)?

All Collectors work just fine for parallel streams, but Collectors supporting direct concurrency (with Collector.Characteristics.CONCURRENT ) are eligible for optimizations that others are not. groupingByConcurrent falls into this category.

(Roughly, what happens is that a non-concurrent collector breaks the input into per-thread pieces, creates an accumulator per thread, and then merges them at the end. A concurrent (and unordered) collector creates one accumulator and has several worker threads concurrently merging elements into the same accumulator.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM