简体   繁体   English

Arrays.stream() O(n) 或 O(n log n) 的时间复杂度是多少?

[英]What is Time Complexity for Arrays.stream() O(n) or O(n log n)?

Stream API documentation says:流 API 文档说:

certain stream sources (such as List or arrays ) are intrinsically ordered , whereas others (such as HashSet ) are not.某些流源(例如Listarrays本质上是有序的,而其他流源(例如HashSet )则不是。

What would be the time complexity of Arrays.stream() method? Arrays.stream()方法的时间复杂度是多少?

O(n log n) , as it returns sorted array, or O(n) , as we expect from stream() s methods ? O(n log n) ,因为它返回排序数组,或者O(n) ,正如我们对stream()的方法所期望的那样?

O(n) or O(nlogn) ? O(n)还是O(nlogn)

Neither of these.这些都不是。

Firstly , it seems like you're confusing a stream which elements are sorted with an ordered stream, ie a stream which has a particular encounter order of elements.首先,您似乎混淆了元素排序的流与有序流,即具有特定元素遇到顺序的流。

Whether a stream is ordered or not depends on the stream source and intermediate operations in it.流是否有序取决于流源和其中的中间操作。

A stream created over an array, or ordered collection like a List , or a Queue is ordered respectively to order elements in it, but it does not imply that such stream is sorted.在数组或有序集合(如ListQueue )上创建的流分别被排序以对其中的元素进行排序,但这并不意味着此类流已排序。

We can make a stream unordered by applying unordered() operation on in it.我们可以通过在其中应用unordered()操作来使流无序。 This operation alone will not change the stream data, but it will have an impact on the execution of stateful intermediate operations like takeWhile() that require buffering, and terminal operation like reduce() , collect() that give a guarantee to respect the initial encounter order.仅此操作不会更改流数据,但会影响有状态的中间操作(如需要缓冲的takeWhile()终端操作(如reduce()collect()的执行,这些操作保证尊重初始遇到秩序。 As a result, a parallel unordered stream might have better performance because of loosening this constraint.因此,由于放松了这个约束,并行无序流可能具有更好的性能。

Here is a quote from the API documentation :这是来自API 文档的引用:

Ordering订购

Streams may or may not have a defined encounter order .流可能有也可能没有定义的遭遇顺序 Whether or not a stream has an encounter order depends on the source and the intermediate operations .流是否有遇到顺序取决于源中间操作 Certain stream sources (such as List or arrays ) are intrinsically ordered , whereas others (such as HashSet ) are not.某些流源(例如Listarrays本质上是有序的,而其他流源(例如HashSet )则不是。 Some intermediate operations, such as sorted() , may impose an encounter order on an otherwise unordered stream, and others may render an ordered stream unordered, such as BaseStream.unordered() .一些中间操作,例如sorted() ,可能会在其他无序的流上施加遇到顺序,而其他操作可能会呈现无序的有序流,例如BaseStream.unordered() Further, some terminal operations may ignore encounter order, such as forEach() .此外,一些终端操作可能会忽略遇到顺序,例如forEach()

If a stream is ordered, most operations are constrained to operate on the elements in their encounter order;如果流是有序的,则大多数操作都被限制为按照遇到的顺序对元素进行操作; if the source of a stream is a List containing [1, 2, 3] , then the result of executing map(x -> x*2) must be [2, 4, 6] .如果流的源是包含[1, 2, 3]的 List ,则执行map(x -> x*2)的结果必须是[2, 4, 6] However, if the source has no defined encounter order, then any permutation of the values [2, 4, 6] would be a valid result.但是,如果源没有定义的遇到顺序,那么值[2, 4, 6]的任何排列都是有效的结果。

For sequential streams, the presence or absence of an encounter order does not affect performance, only determinism.对于顺序流,遇到顺序的存在与否不会影响性能,只会影响确定性。 If a stream is ordered, repeated execution of identical stream pipelines on an identical source will produce an identical result;如果流是有序的,在相同的源上重复执行相同的流管道将产生相同的结果; if it is not ordered, repeated execution might produce different results.如果没有排序,重复执行可能会产生不同的结果。

For parallel streams, relaxing the ordering constraint can sometimes enable more efficient execution.对于并行流,放宽排序约束有时可以提高执行效率。 Certain aggregate operations, such as filtering duplicates ( distinct() ) or grouped reductions ( Collectors.groupingBy() ) can be implemented more efficiently if ordering of elements is not relevant.如果元素的排序不相关,则可以更有效地实现某些聚合操作,例如过滤重复项 ( distinct() ) 或分组缩减 ( Collectors.groupingBy() )。 Similarly, operations that are intrinsically tied to encounter order, such as limit() , may require buffering to ensure proper ordering, undermining the benefit of parallelism.类似地,本质上与遇到顺序相关的操作,例如limit() ,可能需要缓冲以确保正确排序,从而破坏并行性的好处。 In cases where the stream has an encounter order, but the user does not particularly care about that encounter order, explicitly de-ordering the stream with unordered() may improve parallel performance for some stateful or terminal operations.在流有遇到顺序的情况下,但用户并不特别关心该遇到顺序,使用unordered()显式地对流进行降序可能会提高某些有状态或终端操作的并行性能。 However, most stream pipelines, such as the "sum of weight of blocks" example above, still parallelize efficiently even under ordering constraints.然而,大多数流管道,例如上面的“块的权重之和”示例,即使在排序约束下仍然有效地并行化。

Secondly , because you're assuming that creating a stream over an array will cost at list O(n) you might have a misconception regarding the nature of streams.其次,因为您假设在数组上创建流将花费列表O(n) ,您可能对流的性质有误解。

In essence, stream is a mean of iteration , it is not a container of data like Collection.本质上,流是一种迭代的手段,它不是像 Collection 这样的数据容器。

Creation of a stream doesn't require dumping all the data from the source into memory, we're only creating an internal iterator over the source of data, and this action has a time complexity of O(1) .创建流不需要将所有数据从源转储到内存中,我们只是在数据源上创建一个内部迭代器,并且此操作的时间复杂度为O(1)

Streams are lazy and every action in the stream pipeline occur only when it's needed, and elements from the source are processed one by one.流是惰性的,流管道中的每个操作仅在需要时发生,并且源中的元素被一一处理。

For instance, let's assume we have an integer array containing 1,000,000 elements, and we want to get the first 10 elements from it as hexadecimal strings:例如,假设我们有一个包含1,000,000个元素的整数数组,我们希望从中获取前10元素作为十六进制字符串:

List<String> result = Arrays.stream(sourceArray)
    .mapToObj(Integer::toHexString)
    .limit(10)
    .toList();

On execution, only the first 10 elements would be retrieved from the source array, and then the stream would immediately terminate, producing the result.在执行时,只会从源数组中检索前10元素,然后流将立即终止,并产生结果。

The overall time complexity of such a stream would be O(1) because we care only about a fixed number of elements at the very beginning, and don't need all the data that the source contains.这种流的整体时间复杂度为O(1) ,因为我们一开始只关心固定数量的元素,不需要源包含的所有数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM