简体   繁体   English

Stream.parallel() 不会更新 spliterator 的特性吗?

[英]Doesn't Stream.parallel() update the characteristics of spliterator?

This question is based on the answers to this question What is the difference between Stream.of and IntStream.range?这个问题是基于这个问题的答案Stream.of 和 IntStream.range 有什么区别?

Since the IntStream.range produces an already sorted stream, the output to the below code would only generate the output as 0 :由于IntStream.range生成已排序的 stream,因此 output 到以下代码只会生成 output 为0

IntStream.range(0, 4)
         .peek(e -> System.out.println(e))
         .sorted()
         .findFirst();

Also the spliterator would have SORTED characteristics.拆分器也将具有SORTED特征。 Below code returns true :下面的代码返回true

System.out.println(
    IntStream.range(0, 4)
             .spliterator()
             .hasCharacteristics(Spliterator.SORTED)
);

Now, If I introduce a parallel() in the first code, then as expected, the output would contain all 4 numbers from 0 to 3 but in a random order, because the stream is not sorted anymore due to parallel() .现在,如果我在第一个代码中引入一个parallel() ,那么正如预期的那样,output 将包含从03的所有 4 个数字,但顺序是随机的,因为 stream 由于parallel()而不再排序。

IntStream.range(0, 4)
         .parallel()
         .peek(e -> System.out.println(e))
         .sorted()
         .findFirst();

This would produce something like below in any order:这将以任何顺序产生如下所示的内容:

2
0
1
3

So, I expect that the SORTED property has been removed due to parallel() .所以,我希望SORTED属性由于parallel()而被删除。 But, the below code returns true as well.但是,下面的代码也返回true

System.out.println(
    IntStream.range(0, 4)
             .parallel()
             .spliterator()
             .hasCharacteristics(Spliterator.SORTED)
);

Why doesn't the parallel() change SORTED property?为什么parallel()不改变SORTED属性? And since all four numbers are printed, How does Java realize that the stream is not sorted even though the SORTED property still exists?并且由于打印了所有四个数字, Java 如何意识到SORTED未排序,即使 SORTED 属性仍然存在?

How exactly this is done is very much an implementation detail.究竟如何做到这一点在很大程度上是一个实现细节。 You will have to dig deep inside the source code to really see why.您必须深入挖掘源代码才能真正了解原因。 Basically, parallel and sequential pipelines are just handled differently.基本上,并行和顺序流水线的处理方式不同。 Look at the AbstractPipeline.evaluate , which checks isParallel() , then does different things depending whether the pipeline is parallel.查看AbstractPipeline.evaluate ,它检查isParallel() ,然后根据管道是否并行执行不同的操作。

    return isParallel()
           ? terminalOp.evaluateParallel(this, sourceSpliterator(terminalOp.getOpFlags()))
           : terminalOp.evaluateSequential(this, sourceSpliterator(terminalOp.getOpFlags()));

If you then look at SortedOps.OfInt , you'll see that it overrides two methods:如果您再查看SortedOps.OfInt ,您会发现它覆盖了两个方法:

@Override
public Sink<Integer> opWrapSink(int flags, Sink sink) {
    Objects.requireNonNull(sink);

    if (StreamOpFlag.SORTED.isKnown(flags))
        return sink;
    else if (StreamOpFlag.SIZED.isKnown(flags))
        return new SizedIntSortingSink(sink);
    else
        return new IntSortingSink(sink);
}

@Override
public <P_IN> Node<Integer> opEvaluateParallel(PipelineHelper<Integer> helper,
                                               Spliterator<P_IN> spliterator,
                                               IntFunction<Integer[]> generator) {
    if (StreamOpFlag.SORTED.isKnown(helper.getStreamAndOpFlags())) {
        return helper.evaluate(spliterator, false, generator);
    }
    else {
        Node.OfInt n = (Node.OfInt) helper.evaluate(spliterator, true, generator);

        int[] content = n.asPrimitiveArray();
        Arrays.parallelSort(content);

        return Nodes.node(content);
    }
}

opWrapSink will be eventually called if it's a sequential pipeline, and opEvaluateParallel (as its name suggests) will be called when it's a parallel stream.如果它是顺序管道,最终将调用opWrapSink ,而当它是并行 stream 时,将调用opEvaluateParallel (顾名思义)。 Notice how opWrapSink doesn't do anything to the given sink if the pipeline is already sorted (just returns it unchanged), but opEvaluateParallel always evaluates the spliterator.请注意,如果管道已经排序(只是将其原封不动地返回), opWrapSink不会对给定的接收器执行任何操作,但opEvaluateParallel始终评估拆分器。

Also note that parallel-ness and sorted-ness are not mutually exclusive.另请注意,并行性和排序性并不相互排斥。 You can have a stream with any combination of those characteristics.您可以拥有具有这些特性的任意组合的 stream。

"Sorted" is a characteristic of a Spliterator . “排序”是Spliterator的一个特征。 It's not technically a characteristic of a Stream (like "parallel" is).从技术上讲,这不是Stream的特征(就像“并行”一样)。 Sure, parallel could create a stream with an entirely new spliterator (that gets elements from the original spliterator) with entirely new characteristics, but why do that, when you can just reuse the same spliterator?当然, parallel可以创建一个 stream 和一个全新的分离器(从原始分离器中获取元素)和全新的特性,但是当你可以重复使用相同的分离器时,为什么要这样做呢? Id imagine you'll have to handle parallel and sequential streams differently in any case.我想你在任何情况下都必须以不同的方式处理并行和顺序流。

You need to take a step back and think of how you would solve such a problem in general , considering that ForkJoinPool is used for parallel streams and it works based on work stealing .考虑到ForkJoinPool用于并行流并且它的工作原理基于工作窃取,您需要退后一步,想想一般如何解决这样的问题。 It would be very helpful if you knew how a Spliterator works, too.如果您也知道Spliterator的工作原理,那将非常有帮助。 Some details here . 这里有一些细节。

You have a certain Stream, you "split it" (very simplified) into smaller pieces and give all those pieces to a ForkJoinPool for execution.你有一个 Stream,你将它“拆分”(非常简化)成更小的部分,并将所有这些部分交给ForkJoinPool执行。 All of those pieces are worked on independently, by individual threads.所有这些部分都是由单独的线程独立处理的。 Since we are talking about threads here, there is obviously no sequence of events, things happen randomly (that is why you see a random order output).由于我们在这里讨论线程,显然没有事件顺序,事情是随机发生的(这就是你看到随机顺序输出的原因)。

If your stream preserves the order , terminal operation is suppose to preserve it too.如果您的 stream 保留订单,终端操作也应该保留它。 So while intermediate operations are executed in any order, your terminal operation (if the stream up to that point is ordered), will handle elements in an ordered fashion.因此,虽然中间操作以任何顺序执行,但您的终端操作(如果 stream 到该点是有序的)将以有序的方式处理元素。 To put it slightly simplified:稍微简化一下:

System.out.println(
    IntStream.of(1,2,3)
             .parallel()
             .map(x -> {System.out.println(x * 2); return x * 2;})
             .boxed()
             .collect(Collectors.toList()));

map will process elements in an unknown order ( ForkJoinPool and threads, remember that), but collect will receive elements in order , "left to right". map将以未知的顺序处理元素( ForkJoinPool和线程,记住这一点),但collect将按“从左到右”的顺序接收元素。


Now, if we extrapolate that to your example: when you invoke parallel , the stream is split in small pieces and worked on.现在,如果我们将其推断到您的示例:当您调用parallel时,stream 被分成小块并进行处理。 For example look how this is split (a single time).例如,看看这是如何拆分的(一次)。

Spliterator<Integer> spliterator =
IntStream.of(5, 4, 3, 2, 1, 5, 6, 7, 8)
         .parallel()
         .boxed()
         .sorted()
         .spliterator()
         .trySplit(); // trySplit is invoked internally on parallel

spliterator.forEachRemaining(System.out::println);

On my machine it prints 1,2,3,4 .在我的机器上打印1,2,3,4 This means that the internal implementation splits the stream in two Spliterator s: left and right .这意味着内部实现将 stream 拆分为两个Spliteratorleftright left has [1, 2, 3, 4] and right has [5, 6, 7, 8] . left[1, 2, 3, 4] ,右边有[5, 6, 7, 8] But that is not it, these Spliterator s can be split further.但事实并非如此,这些Spliterator还可以进一步拆分。 For example:例如:

Spliterator<Integer> spliterator =
IntStream.of(5, 4, 3, 2, 1, 5, 6, 7, 8)
         .parallel()
         .boxed()
         .sorted()
         .spliterator()
         .trySplit()
         .trySplit()
         .trySplit();

spliterator.forEachRemaining(System.out::println);

if you try to invoke trySplit again, you will get a null - meaning, that's it, I can't split anymore.如果您尝试再次调用trySplit ,您将得到null - 意思就是,就是这样,我不能再拆分了。

So, your Stream: IntStream.range(0, 4) is going to be split in 4 spliterators.因此,您的 Stream: IntStream.range(0, 4)将被拆分为 4 个拆分器。 All worked on individually, by a thread.所有的工作都是由一个线程单独完成的。 If your first thread knows that this Spliterator it currently works on, is the "left-most one", that's it.如果你的第一个线程知道它当前工作的这个Spliterator是“最左边的”,就是这样。 The rest of the threads do not even need to start their work - the result is known.线程的 rest 甚至不需要启动它们的工作 - 结果是已知的。

On the other hand, it could be that this Spliterator that has the "left-most" element is only started last.另一方面,这个具有“最左边”元素的Spliterator可能只在最后启动。 So the first three ones, might already be done with their work (thus peek is invoked in your example), but they do not "produce" the needed result.因此,前三个可能已经完成了他们的工作(因此在您的示例中调用了peek ),但它们不会“产生”所需的结果。

As a matter fact, this is how it is done internally .事实上,这是在内部完成的。 You do not need to understand the code - but the flow and the method names should be obvious.您不需要了解代码 - 但流程和方法名称应该是显而易见的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM