简体   繁体   中英

Java 8 Stream: Iterating, Processing and Count

Is it ok to process and count processed data in such way?

long count = userDao.findApprovedWithoutData().parallelStream().filter(u -> {
    Data d = dataDao.findInfoByEmail(u.getEmail());
    boolean ret = false;
    if (d != null) {
        String result = "";
        result += getFieldValue(d::getName, ". \n");
        result += getFieldValue(d::getOrganization, ". \n");
        result += getFieldValue(d::getAddress, ". \n");
        if(!result.isEmpty()) {
            u.setData(d.getInfo());
            userDao.update(u);
            ret = true;
        }
    }
    return ret;
}).count();

So, in short: iterate over not complete records, update if data is present and count this number of records?

IMHO this is bad code, because:

The filter predicate has (quite significant) side effects

Predicates should not have side effects (just like getters shouldn't). It's unexpected, and that makes it bad.

The filter predicate is very inefficient

Each execution of the predicate causes a large chain of queries to fire, which makes this code not scaleable.

At first glance, the main purpose seems to be getting a count, but really that's a minor (dispensable) bit of info

Good code makes it obvious what is going on (unlike this code)

You should change the code to use a (fairly simple) single update query (that employs a join) and get the count from the "number of rows updated" info in the result from the persistence API.

It depends on your definition of process . I cannot give you a clear yes or no because, I think it is hard to conclude without understanding your code and how it is implemented.

You are using Parallel Stream and what happens there is Java runtime splits the Stream into sub-streams based on number of available threads in ForkJoinPool 's common pool.

When using parallelism you need to be careful for possible side effects:

  1. Interference ( Lambda expression in a stream should not interfere)

Lambda expressions in stream operations should not interfere. Interference occurs when the source of a stream is modified while a pipeline processes the stream.

  1. Statetful Lambda expressions

Avoid using stateful lambda expressions as parameters in stream operations. A stateful lambda expression is one whose result depends on any state that might change during the execution of a pipeline.

Looking at your question and applying the above points to it.

Non-interference > strongly states that Lambda expressions should not interfere with the source of stream (unless stream source is concurrent) during pipeline operation because it can cause:

  • Exception (ie ConcurrentModificationException)
  • Incorrect Answer
  • Nonconformant behaviour

With exception of well-behaved streams where the modification takes place during intermediate operation (ie filter), read more in here .

Your Lambda expression does interfere with the source of the stream, which is not advised but, the interference is within Intermediate operation and now everything comes down to whether the stream is well-behaved or not. So you might consider re-thinking your lambda expression when it comes to interference. It might also come down to how you update the source of the stream via userDao.udpate , which is not clear from your question.

Stateful Lambda Expression > Your Lambda expression does not seem to be stateful and that is because the result of Lambda depends on value/s that do not change during the execution of the pipeline. So this does not apply to your case.

I advise you go through the documentation of Java 8 Stream as well as this blog which explains Java 8 Stream really well with examples.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM