简体   繁体   English

为什么使用Java的流API收集器进行管道分组需要更多时间?

[英]Why Pipeline grouping using Java's stream API collectors taking more time?

I'm creating a Map of Map from a list using Stream API. 我正在使用Stream API从列表创建Map of Map。 List contains 10 million records. 列表包含1000万条记录。 I have done this with single statement using two 'groupingBy' operations as below. 我已经使用以下两个'groupingBy'操作通过单个语句完成了此操作。 The problem is that this one line statement is taking almost 1.5 minute to execute which became a bottleneck in execution in my performance critical application. 问题在于,这一行语句需要花费近1.5分钟的时间来执行,这成为了我的性能至关重要的应用程序中执行的瓶颈。

I have given the code I tried below using parallel stream API 我已经给出了我在下面使用并行流API尝试过的代码

Map<MyKey, Map<String, List<Person>>> personMap = personList.parallelStream()
    .collect(Collectors.groupingBy(
        person -> new MyKey(person.Id(), person.getPricePointId()),
        Collectors.groupingBy(Person::getWorkType)));

It is taking more than 1.5 minute to execute the above code which almost 75 % of my overall execution time. 执行上面的代码需要花费超过1.5分钟的时间,几乎占我总执行时间的75%。 I do not find any other solution faster than this. 我找不到比这更快的其他解决方案。 So my question here is, Is this maximum possible throughput for this much volume of data ? 所以我的问题是,这么大量的数据是否具有最大的吞吐量? or using downstream(multiple groupingBy) is not right option here ?, If not what should be the right way to reduce the execution time? 还是在这里使用下游(multiple groupingBy)不是正确的选择?如果不是,减少执行时间的正确方法是什么?

what youre are doing is a bad idea querying a 10 million records takes a lot of memory so, group your query into limit and start point and divide queries according to limit and start point and run every query in separate thread at last join the thread. 您正在做的是一个坏主意,查询一千万条记录会占用大量内存,因此,将您的查询分为限制和起始点,然后根据限制和起始点划分查询,然后在单独的线程中运行每个查询,最后加入该线程。 it will be much more faster as well efficient for your use case 对于您的用例,它将更快,更高效

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM