简体   繁体   English

Java Stream distinct().sorted()的时间复杂度是多少?

[英]What's the time complexity of Java Stream distinct().sorted()?

Every time when I get a coding interview, I always avoid using Java stream, because I can't analyze the time complexity very well.每次接受编码面试时,我总是避免使用 Java stream,因为我无法很好地分析时间复杂度。

For example: in my daily work, I might write like this:例如:在我的日常工作中,我可能会这样写:

Arrays.stream(a).distinct().sorted().toArray();

to get the unique number and sort them.获取唯一编号并对它们进行排序。

but I'm curious about the time complexity will be..?但我很好奇时间复杂度会是……? is distinct().sorted will become an nested loop? distinct().sorted 会变成嵌套循环吗?

should I need to seperate them?我应该把它们分开吗?

int[] arr = Arrays.stream(a).distinct().toArray();
Arrays.stream(arr).sorted().toArray();

so sometimes when I have an interview, I'll use set to distinct then sort them.... but I really want to write a clean code...所以有时当我接受采访时,我会使用 set to distinct 然后对它们进行排序......但我真的很想写一个干净的代码......

If anyone can help!如果有人可以帮忙! Thank you!谢谢!

There is no definite statement possible, as you are developing against an interface and the specification does not mandate a particular sort algorithm.没有明确的声明,因为您正在针对接口进行开发,并且规范没有强制要求特定的排序算法。

For the generic Stream , we have to assume a comparison sort whose worst case of O(n log n) can not be avoided by any algorithm.对于通用Stream ,我们必须假设任何算法都无法避免O(n log n) 的最坏情况的比较排序

But your example uses an IntStream which, in principle, could use a counting sort or similar, having O(n).但是您的示例使用IntStream原则上可以使用具有 O(n) 的计数排序或类似排序。 This doesn't happen in practice with the reference implementation, as having a better worst case time complexity does not necessarily lead to a better performance in practice and the maximum number of elements is limited to the JVM's maximum array size.参考实现在实践中不会发生这种情况,因为具有更好的最坏情况时间复杂度并不一定会在实践中带来更好的性能,并且最大元素数受限于 JVM 的最大数组大小。

The time complexity of distinct() is O(n), as it just checks whether adding into a HashSet succeeds. distinct()的时间复杂度是 O(n),因为它只是检查添加到HashSet中是否成功。 Combining the O(n) with O(n log n) leads to an overall complexity of O(n log n).将 O(n) 与 O(n log n) 相结合会导致 O(n log n) 的整体复杂度。 Maybe the interviewer got this combining of time complexities wrong.也许面试官把这种时间复杂度的组合弄错了。

But this is a good example demonstrating that time complexity is not the equal to performance.但这是一个很好的例子,证明时间复杂度不等于性能。 When you use sorted().distinct() , the distinct() operation will utilize the sorted nature of the incoming elements, which makes it unnecessary to build a HashSet behind the scenes¹.当您使用sorted().distinct()时, distinct()操作将利用传入元素的排序特性,这使得无需在幕后构建HashSet Since the reference implementation has no primitive value set, this eliminates a lot of boxing overhead.由于参考实现没有原始值集,这消除了很多装箱开销。 On the other hand, using distinct().sorted() could reduce the number of elements to sort, but it requires to have significantly less distinct elements than total stream elements to pay off.另一方面,使用distinct().sorted()可以减少要排序的元素数量,但它需要比总 stream 元素少得多的不同元素才能得到回报。

Such performance differences are not covered by the time complexity, which still is the same for both approaches.时间复杂度没有涵盖这种性能差异,这两种方法仍然相同。 But as said, for the streams of primitive types, different algorithms with a different time complexity would be possible.但如前所述,对于原始类型的流,具有不同时间复杂度的不同算法是可能的。

But one thing, we can say for sure.但有一件事,我们可以肯定地说。 When you split the operations into two stream operations, like requesting a result array from the first and passing it to Arrays.stream again, there is no chance that the underlying implementation utilizes knowledge about the previous operation in the next operation.当您将操作拆分为两个 stream 操作时,例如从第一个请求结果数组并将其传递给Arrays.stream操作,下一个操作中的操作没有机会再次使用有关底层实现的知识。

Note that the statements above assume a terminal operation like your example's toArray that consumes all elements and requires the resulting encounter order to be maintained.请注意,上面的语句假设一个终端操作,例如您的示例的toArray ,它消耗所有元素并需要维护生成的遇到顺序。 With other, short-circuiting or unordered terminal operations, the overall time complexity could change, eg sorted().findFirst() could get optimized to the equivalent of min() or the sorting step could get eliminated for an unordered terminal operation, like sum() .对于其他短路或无序终端操作,整体时间复杂度可能会改变,例如sorted().findFirst()可以优化为等效于min()或者对于无序终端操作可以消除排序步骤,例如sum() That does not happen in the current reference implementation, but, as said, you're programming against an interface.这在当前的参考实现中不会发生,但是,如前所述,您正在针对接口进行编程。


¹ for primitive streams, this only works in Java 9+. ¹对于原始流,这仅适用于 Java 9+。 As said, there is no primitive specialization for distinct() , it's implemented like boxed().distinct().mapToInt(i -> i) and in Java 8, boxed() is implemented as mapToObj(Integer::valueOf) which loses the information about the sorted input, as explained in the last section of this answer .如前所述, distinct()没有原始专业化,它像boxed().distinct().mapToInt(i -> i)一样实现,在 Java 8 中, boxed()被实现为mapToObj(Integer::valueOf) ,丢失有关已排序输入的信息,如本答案的最后一部分所述。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM