简体   繁体   中英

Understanding sequential vs parallel stream spliterators in Java 8 and Java 9

A question about spliterators that at first glance is not straightforward.

In streams, .parallel() changes the behaviour that the stream is processed. However I was expecting the spliterators created from sequential and parallel streams to be the same. For example, in sequential streams typically, the .trySplit() is never invoked , while in parallel streams it is, in order to hand over the split spliterator to another thread.

Differences between stream.spliterator() vs stream.parallel().spliterator() :

  1. They may have different characteristics:

     Stream.of(1L, 2L, 3L).limit(2); // ORDERED Stream.of(1L, 2L, 3L).limit(2).parallel(); // SUBSIZED, SIZED, ORDERED 

It seems another nonsense stream spliterator characteristics policy (in parallel seems better calculated) discussed here: Understanding deeply spliterator characteristics in java 8 and java 9

  1. They may have different behaviour in terms of splitting using .trySplit() :

     Stream.of(1L, 2L, 3L); // NON NULL Stream.of(1L, 2L, 3L).limit(2); // NULL Stream.of(1L, 2L, 3L).limit(2).parallel(); // NON NULL 

Why do the last two have different behaviours? Why I can't I split a sequential stream if I want to? (Could be useful to discard one of the splits for fast processing, for example).

  1. Big impacts when transforming a spliterators to a stream:

     spliterator = Stream.of(1L, 2L, 3L).limit(2).spliterator(); stream = StreamSupport.stream(spliterator, true); // No parallel processing! 

In this case, a spliterator was created from a sequential stream which disables the ability to split ( .trySplit() returns null). When later, there is a need to transform back to a stream, that stream won't benefit from parallel processing. A shame.

The big question: As a workaround, what are the major impacts of always transforming a stream to parallel before invoking .spliterator() ?

// Supports activation of parallel processing later
public static <T> Stream<T> myOperation(Stream<T> stream) {
    boolean isParallel = stream.isParallel();
    Spliterator<T> spliterator = stream.parallel().spliterator();
    return StreamSupport.stream(new Spliterator<T>() {
        // My implementation of the interface here (omitted for clarity)
    }, isParallel).onClose(stream::close);
}

// Now I have the option to use parallel processing when needed:
myOperation(stream).skip(1).parallel()...

This is not a general property of spliterators, but only of wrapping spliterators encapsulating a stream pipeline.

When you are calling spliterator() on a stream that has been generated from a spliterator and has no chained operation, you'll get the source spliterator which may or may not support trySplit , regardless of the stream parallel state.

ArrayList<String> list = new ArrayList<>();
Collections.addAll(list, "foo", "bar", "baz");
Spliterator<String> sp1 = list.spliterator(), sp2=list.stream().spliterator();
// true
System.out.println(sp1.getClass()==sp2.getClass());
// not null
System.out.println(sp2.trySplit());

likewise

Spliterator<String> sp = Stream.of("foo", "bar", "baz").spliterator();
// not null
System.out.println(sp.trySplit());

But as soon as you chain operations before calling spliterator() , you will get a spliterator wrapping the stream pipeline. Now, it would be possible to implement dedicated spliterators performing the associated operation, like a LimitSpliterator or a MappingSpliterator , but this has not been done, as converting a stream back to a spliterator has been considered as last resort when the other terminal operations do not fit, not a high priority use case. Instead, you will always get an instance of the single implementation class that tries to translate the inner workings of the stream pipeline implementation to the spliterator API.

This can be quiet complicated for stateful operations, most notably, sorted , distinct or skip & limit for a non- SIZED stream. For trivial stateless operations, like map or filter , it would be much easier to provide support, as has been even remarked in a code comment

Abstract wrapping spliterator that binds to the spliterator of a pipeline helper on first operation. This spliterator is not late-binding and will bind to the source spliterator when first operated on. A wrapping spliterator produced from a sequential stream cannot be split if there are stateful operations present.

 … // @@@ Detect if stateful operations are present or not // If not then can split otherwise cannot /** * True if this spliterator supports splitting */ final boolean isParallel; 

but it seems that currently this detection has not been implemented and all intermediate operations are treated like stateful operations.

Spliterator<String> sp = Stream.of("foo", "bar", "baz").map(x -> x).spliterator();
// null
System.out.println(sp.trySplit());

When you try to work-around this by always calling parallel , there will be no impact when the stream pipeline consists of stateless operations only. But when having a stateful operation, it might change the behavior significantly. Eg, when you have a sorted step, all elements have to be buffered and sorted, before you can consume the first element. For a parallel stream, it will likely use a parallelSort , even when you never invoke trySplit .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM