Java 8 Stream API - 任何有状态的中间操作都能保证新的源集合吗？

Question

Is the following statement true? 以下陈述是真的吗？

The sorted() operation is a “stateful intermediate operation”, which means that subsequent operations no longer operate on the backing collection, but on an internal state. sorted()操作是“有状态的中间操作”，这意味着后续操作不再对后备集合进行操作，而是对内部状态进行操作。

^{( Source and source - they seem to copy from each other or come from the same source.)} ^{（来源和来源 - 他们似乎互相复制或来自同一来源。）}

I have tested Stream::sorted as a snippet from sources above: 我已经测试了Stream::sorted作为上述来源的片段：

final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());

list.stream()
    .filter(i -> i > 5)
    .sorted()
    .forEach(list::remove);

System.out.println(list);            // Prints [0, 1, 2, 3, 4, 5]

It works. 有用。 I replaced Stream::sorted with Stream::distinct , Stream::limit and Stream::skip : 我用Stream::distinct ， Stream::limit和Stream::skip替换了Stream::sorted ：

final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());

list.stream()
    .filter(i -> i > 5)
    .distinct()
    .forEach(list::remove);          // Throws NullPointerException

To my surprise, the NullPointerException is thrown. 令我惊讶的是，抛出了NullPointerException 。

All the tested methods follow the stateful intermediate operation characteristics. 所有测试方法都遵循有状态中间操作特性。 Yet, this unique behavior of Stream::sorted is not documented nor the Stream operations and pipelines part explains whether the stateful intermediate operations really guarantee a new source collection. 然而， Stream::sorted这种独特行为没有记录， Stream操作和管道部分也解释了有状态中间操作是否真正保证了新的源集合。

Where my confusion comes from and what is the explanation of the behavior above? 我的困惑来自何处以及上述行为的解释是什么？

Answer 1

The API documentation makes no such guarantee “that subsequent operations no longer operate on the backing collection”, hence, you should never rely on such a behavior of a particular implementation. API文档没有保证“后续操作不再对后备集合进行操作”，因此，您永远不应该依赖于特定实现的这种行为。

Your example happens to do the desired thing by accident; 你的例子偶然发生了想要的事情; there's not even a guarantee that the List created by collect(Collectors.toList()) supports the remove operation. 甚至不能保证collect(Collectors.toList())创建的List支持remove操作。

To show a counter-example 显示一个反例

Set<Integer> set = IntStream.range(0, 10).boxed()
    .collect(Collectors.toCollection(TreeSet::new));
set.stream()
    .filter(i -> i > 5)
    .sorted()
    .forEach(set::remove);

throws a ConcurrentModificationException . 抛出ConcurrentModificationException 。 The reason is that the implementation optimizes this scenario, as the source is already sorted. 原因是实现优化了这种情况，因为源已经排序。 In principle, it could do the same optimization to your original example, as forEach is explicitly performing the action in no specified order, hence, the sorting is unnecessary. 原则上，它可以对原始示例执行相同的优化，因为forEach以无指定顺序显式执行操作，因此，排序是不必要的。

There are other optimizations imaginable, eg sorted().findFirst() could get converted to a “find the minimum” operation, without the need to copy the element into a new storage for sorting. 还有其他可以想象的优化，例如sorted().findFirst()可以转换为“查找最小”操作，而无需将元素复制到新存储中进行排序。

So the bottom line is, when relying on unspecified behavior, what may happen to work today, may break tomorrow, when new optimizations are added. 因此，最重要的是，当依赖于未指明的行为时，今天可能发生的事情可能会在明天，即添加新的优化时中断。

Answer 2

Well sorted has to be a full copying barrier for the stream pipeline, after all your source could be not sorted ; sorted好后必须是流管道的完整复制屏障，因为所有源都无法排序 ; but this is not documented as such, thus do not rely on it. 但这并没有记录，因此不依赖它。

This is not just about sorted per-se, but what other optimization can be done to the stream pipeline, so that sorted could be entirely skipped. 这不仅仅是关于sorted本身，而是可以对流管道进行其他优化，以便可以完全跳过sorted 。 For example: 例如：

List<Integer> sortedList = IntStream.range(0, 10)
            .boxed()
            .collect(Collectors.toList());

    StreamSupport.stream(() -> sortedList.spliterator(), Spliterator.SORTED, false)
            .sorted()
            .forEach(sortedList::remove); // fails with CME, thus no copying occurred

Of course, sorted needs to be a full barrier and stop to do an entire sort, unless, of course, it can be skipped, thus the documentation makes no such promises, so that we don't run in weird surprises. 当然， sorted需要是一个完整的障碍并停止进行整个排序，当然，除非它可以被跳过，因此文档没有做出这样的承诺，因此我们不会遇到奇怪的意外。

distinct on the other hand does not have to be a full barrier , all distinct does is check one element at a time, if it is unique; distinct另一方面不必是完整的屏障 ，所有不同的作用是在一个时间检查一个元件，如果是唯一的; so after a single element is checked (and it is unique) it is passed to the next stage, thus without being a full barrier. 因此，在检查单个元素（并且它是唯一的）之后，它将被传递到下一个阶段，因此不会成为完整的障碍。 Either way, this is not documented also... 无论哪种方式，这也没有记录......

Answer 3

You shouldn't have brought up the cases with a terminal operation forEach(list::remove) because list::remove is an interfering function and it violates the "non-interference" principle for terminal actions. 你不应该通过forEach(list::remove)的终端操作提起案例，因为list::remove是一个干扰函数，它违反了终端动作的“非干扰”原则。

It's vital to follow the rules before wondering why an incorrect code snippet causes unexpected (or undocumented) behaviour. 在了解为什么不正确的代码段导致意外（或未记录）行为之前，遵循规则至关重要。

I believe that list::remove is the root of the problem here. 我相信list::remove是问题的根源。 You wouldn't have noticed the difference between the operations for this scenario if you'd written a proper action for forEach . 如果你为forEach写了一个合适的动作，你就不会注意到这个场景的操作之间的区别。

Java 8 Stream API - 任何有状态的中间操作都能保证新的源集合吗？

问题描述

3 个解决方案

解决方案1
31 已采纳 2018-09-11 09:22:02

解决方案2
7 2018-09-11 09:24:23

解决方案3
3 2018-09-11 10:44:47

Java 8 Stream API - 任何有状态的中间操作都能保证新的源集合吗？

问题描述

3 个解决方案

解决方案1 31 已采纳 2018-09-11 09:22:02

解决方案2 7 2018-09-11 09:24:23

解决方案3 3 2018-09-11 10:44:47

解决方案1
31 已采纳 2018-09-11 09:22:02

解决方案2
7 2018-09-11 09:24:23

解决方案3
3 2018-09-11 10:44:47