简体   繁体   English

Stream.spliterator对并行流的奇怪行为

[英]Strange behavior of Stream.spliterator for parallel streams

I'm using the stream spliterator directly for the low-level operations in the library I'm writing. 我正在使用流分裂器直接用于我正在编写的库中的低级操作。 Recently I discovered very weird behavior when I take the stream spliterator and interleave tryAdvance/trySplit calls. 最近,当我使用流分裂器和交错tryAdvance/trySplit调用时,我发现了非常奇怪的行为。 Here's a simple code which demonstrates the problem: 这是一个简单的代码,演示了这个问题:

import java.util.Arrays;
import java.util.Spliterator;

public class SpliteratorBug {
    public static void main(String[] args) {
        Integer[][] input = { { 1 }, { 2, 3 }, { 4, 5, 6 }, { 7, 8 }, { 9 } };
        Spliterator<Integer> spliterator = Arrays.stream(input).parallel()
                .flatMap(Arrays::stream).spliterator();
        spliterator.trySplit();
        spliterator.tryAdvance(s -> {});
        spliterator.trySplit();
        spliterator.forEachRemaining(System.out::println);
    }
}

The output is 输出是

5
6
9

As you can see, after flat-mapping I should get the ordered stream of consecutive numbers from 1 to 9 . 正如您所看到的,在平面映射之后,我应该得到从19的连续数字的有序流。 I split the spliterator once, so it should jump to some intermediate location. 我将分裂器分开一次,所以它应该跳到一些中间位置。 Next I consume an element from it and split it one more time. 接下来我从中消耗一个元素并将其拆分一次。 After that I print all the remaining elements. 之后我打印所有剩余的元素。 I expect that I will have several consecutive elements from the stream tail (probably zero elements, it would also be fine). 我希望我将从流尾部有几个连续的元素(可能是零元素,它也会很好)。 However what I get is 5 and 6 , then sudden jump to 9 . 然而我得到的是56 ,然后突然跳到9

I know that currently in JDK spliterators are not used this way: they always split before the traversal. 我知道目前在JDK分裂器中并没有这样使用:它们总是在遍历之前分裂。 However official documentation does not explicitly forbid to call the trySplit after tryAdvance . 但是官方文档没有明确禁止在trySplit之后调用tryAdvance

The problem was never observed when I use spliterator created directly from collection, array, generated source, etc. It's observed only if the spliterator was created from the parallel stream which had the intermediate flatMap . 当我使用直接从集合,数组,生成的源等创建的spliterator时,从未观察到这个问题。只有当spliterator是从具有中间flatMap的并行流创建时才会观察到。

So the question is: did I hit the bug or it's explicitly forbidden somewhere to use the spliterator in this way? 所以问题是:我是否遇到了这个错误,或者明确禁止某个地方以这种方式使用分裂器?

From the documentation of Spliterator.trySplit() : Spliterator.trySplit()的文档:

This method may return null for any reason, including emptiness, inability to split after traversal has commenced , data structure constraints, and efficiency considerations. 此方法可能由于任何原因返回null ,包括空闲, 在遍历开始后无法拆分 ,数据结构约束和效率考虑。

(emphasis mine) (强调我的)

So the documentation explicitly mentions the possibility to attempt splitting after commencing traversal and suggests that spliterators which are unable to handle this may return null . 因此,文档明确提到在开始遍历后尝试拆分的可能性,并建议无法处理此问题的分裂器可能返回null

So for ordered spliterators, the observed behavior should considered a bug as described by Misha . 因此对于有序的分裂器,观察到的行为应该被认为是Misha所描述的错误。 Generally, the fact that trySplit() has to return a prefix spliterator, in other words, has to hand over all intermediate state regarding the next items to the new spliterator, is a peculiarity of the Spliterator API that makes bugs likely. 通常, trySplit()必须返回前缀 spliterator的事实,换句话说,必须将关于下一个项目的所有中间状态移交给新的spliterator,这是Spliterator API的一个特性,它可能会产生错误。 I took this question as a motive for checking my own spliterator implementations and found a similar bug… 我把这个问题作为检查我自己的spliterator实现的动机,发现了类似的bug ...

From what I can see from the source of AbstractWrappingSpliterator and company, when you tryAdvance , the output of flatMap (4,5,6) gets buffered and then 4 gets consumed leaving (5,6) in the buffer. 从我从AbstractWrappingSpliterator和公司的源代码中可以看出,当你tryAdvanceflatMap (4,5,6)的输出被缓冲,然后4被消耗,留下(5,6)在缓冲区中。 Then trySplit correctly splits off (7,8) to the new Spliterator leaving 9 in old one but the buffered (5,6) stay with the old Spliterator . 然后trySplit正确分割(7,8)到新的Spliterator ,在旧的Spliterator留下9,但缓冲的(5,6)留在旧的Spliterator

So this looks like a bug to me. 所以这看起来像是一个错误。 It should either hand the buffer off to the new Spliterator or return null and refuse to split if the buffer is not empty. 它应该将缓冲区关闭到新的Spliterator或返回null并且如果缓冲区不为空则拒绝拆分。

This behavior was officially recognized as a bug (see JDK-8148838 ), fixed by me and pushed into JDK-9 trunk (see changeset ). 这种行为被正式认定为一个错误(参见JDK-8148838 ),由我修复并推入JDK-9 trunk(参见changeset )。 The sad thing is that my initial patch actually fixed the splitting after flatMap (see webrev ), but this patch was declined as such scenario (using trySplit() after tryAdvance() ) was considered as uncommon and discouraged. 令人遗憾的是,我的初始补丁实际上修复了flatMap之后的拆分(请参阅webrev ),但是这个补丁被拒绝了,因为这种情况(在trySplit()之后使用tryAdvance() )被认为是不常见tryAdvance()鼓励。 The currently accepted solution is to disable the WrappingSpliterator splitting after advance at all which is enough to fix the problem. 当前接受的解决方案是在完全提前后禁用WrappingSpliterator拆分,这足以解决问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM