简体   繁体   中英

Aggregate function & Behavior - Parallel stream

Below is an example using,

aggregate functions ( filter / map / sorted ),

&

behavior ( this::capitalize ),

&

terminal operation( forEach ),

with a given stream( Stream.of(...) ),

Stream 
    .of("horatio", "laertes", "Hamlet", ...) 
    .filter(s -> toLowerCase (s.charAt(0)) == 'h')  # aggregate_function(behavior)
    .map(this::capitalize) 
    .sorted() 
    .forEach(System.out::println);

To not incur race conditions in parallel streams, I learnt that, we need an extra effort to explicitly make behavior work without side-effects(stateless), as shown below,

String capitalize(String s) {
        if (s.length() == 0)
            return s;
        return s.substring(0, 1)
            .toUpperCase()
            + s.substring(1)
            .toLowerCase();

An aggregate_function just applies behavior on each element generated from a stream.


As element generated from a stream(one at a time), with no non-transient storage,

Without any extra effort, are aggregate functions always pure functions, without any side effects? that does not incur race condition in parallel stream

It's a key point of the Stream API that all operations support correct parallel processing as long as your behavioral parameters meet the criteria. Or, as the specification states:

Parallelism

Processing elements with an explicit for-loop is inherently serial. Streams facilitate parallel execution by reframing the computation as a pipeline of aggregate operations, rather than as imperative operations on each individual element. All streams operations can execute either in serial or in parallel.

Except for operations identified as explicitly nondeterministic, such as findAny() , whether a stream executes sequentially or in parallel should not change the result of the computation.

Most stream operations accept parameters that describe user-specified behavior, which are often lambda expressions. To preserve correct behavior, these behavioral parameters must be non-interfering , and in most cases must be stateless . Such parameters are always instances of a functional interface such as Function, and are often lambda expressions or method references.

Note that the term aggregate operation is more general, applying to the entire stream operation, including intermediate operations and the terminal operation. What has been said about map and filter , applies to reduce as well; the reduction function should be side effect free and stateless. Note that while collect incorporates mutable state, it's local and still meets the non-interference criteria. From the outside, you can still view a collect operation is-if stateless.

You have to look at the specific terminal operation's documentation to find out how a parallel execution might affect the outcome, like the mentioned difference between findFirst and findAny . In your case forEach is problematic, as it may invoke the consumer unordered and concurrently, so there is no guaranty that the elements are printed in the order imposed by the preceding sorted step. You should use forEachOrdered here.

By the way, you could make your capitalize method static to emphasize that it doesn't depend on the state of the this instance. Or simplify it to a lambda expression,
s -> s.isEmpty()? s: s.substring(0, 1).toUpperCase() + s.substring(1).toLowerCase()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM