简体   繁体   English

Java8中分组的复杂性

[英]Complexity of grouping in Java8

I would like to learn the time complexity of the given statement below.(In Java8) 我想在下面学习给定语句的时间复杂度。(在Java8中)

list.stream().collect(groupingBy(...)); 

Any idea? 任何想法?

There is no general answer to that question, as the time complexity depends on all operations. 这个问题没有普遍的答案,因为时间复杂度取决于所有操作。 Since the stream has to be processed entirely, there is a base time complexity of O(n) that has to be multiplied by the costs of all operations done per element. 由于必须完全处理流,因此O(n)的基本时间复杂度必须乘以每个元素完成的所有操作的成本。 This, assuming that the iteration costs itself are not worse than O(n) , which is the case for most stream sources. 假设迭代成本本身并不比O(n)差,大多数流源就是这种情况。

So, assuming no intermediate operations that affect the time complexity, the groupingBy has to evaluate the function for each element, which should be independent of other elements, so not affect the time complexity (regardless of how expensive it is, as the O(…) time complexity only tells us, how the time scales with large numbers of stream elements). 因此,假设没有影响时间复杂度的中间操作,则groupingBy必须评估每个元素的功能,该功能应独立于其他元素,因此不影响时间复杂度(不管它有多昂贵,如O(…)时间复杂度仅告诉我们时间如何随着大量流元素而缩放 )。 Then, it will insert the element into a map, which might depend on the number of already contained elements. 然后,它将元素插入地图,这可能取决于已经包含的元素的数量。 Without a custom Map supplier, the map's type is unspecified, hence, no statement can be made here. 如果没有自定义Map供应商,则地图的类型是不确定的,因此在此无法声明。

In practice, it's reasonable to assume that the result will be some sort of hashing map with a net O(1) lookup complexity by default. 实际上,可以合理地假设结果将是某种默认情况下具有净O(1)查找复杂度的某种哈希映射。 So we have a net time complexity of O(n) for the grouping. 因此,分组的净时间复杂度为O(n) Then, we have the downstream collector. 然后,我们有了下游收集器。

The default downstream collector is toList() , which produces an unspecified List type, so again, we can't say anything about the costs of adding elements to it. 默认的下游收集器是toList() ,它产生一个未指定的List类型,因此,再说一遍,关于添加元素的成本我们toList()

The current implementation produces an ArrayList , which has to perform copy operations when the capacity is exceeded, but since the capacity is raised by a factor each time, there is still a net complexity of O(n) for adding n elements. 当前的实现产生一个ArrayList ,当超过容量时必须执行复制操作,但是由于每次都会将容量提高一个因数 ,因此添加n个元素仍然存在O(n)的净复杂度。 It's reasonable to assume that future changes to the toList() implementation won't make the costs worse than what we have today. 可以合理地假设,将来对toList()实现的更改不会使成本比我们今天要差。 So the time complexity of a default groupingBy collection is likely O(n) . 因此,默认groupingBy集合的时间复杂度可能为O(n)

If we use a custom Map collector with a custom downstream collector, the complexity depends on the average number of groups to number of elements per group ratio. 如果我们将自定义Map收集器与自定义下游收集器一起使用,则复杂性取决于组的平均数与每个组的元素数之比。 The worst case would be the worst of either, the map's lookup and the downstream collector's element processing (times the number of elements), as we could have one group containing all items or each item being in its own group. 最坏的情况是地图查找和下游收集器的元素处理(元素数量的乘积)中的最坏情况,因为我们可以有一个包含所有项目的组,或者每个项目都属于自己的组。

But usually, you are capable of predicting a bias for a particular grouping operation, so you would want to calculate a time complexity for that particular operation, instead of relying on a statement about all grouping operations in general. 但是通常,您能够预测特定分组操作的偏差,因此您将要计算该特定操作的时间复杂度,而不是通常依赖于所有分组操作的声明。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM