简体   繁体   中英

Process certain stream elements last

I have an interface which I must implement that expects a Stream response. Some elements in my source are missing data, and I have to use other elements in the source to find it. It is too large to hold all the elements in memory. I can write a routine to find the missing data, but only if I process the elements missing the data last.

Here is a simplified example of my attempt to solve this. In this case I am trying to save the 30 element for processing at the end after an additional routine of addOne. But I am receiving a ConcurrentModificationException when the program attempts to read from the List Stream.

package test;

import java.util.ArrayList;
import java.util.List;
import java.util.stream.Stream;

public class TestStreams {
    private static List<Integer> savedForLater = new ArrayList<>();

    public static void main(String[] args) {
        Stream<Integer> origStream = Stream.of(10, 20, 30, 40, 50).filter(
                i -> saveThirtyForLater(i));
        Stream<Integer> savedForLaterStream = savedForLater.stream().map(
                i -> addOne(i));

        // Exception
        Stream.concat(origStream, savedForLaterStream).forEach(
            i -> System.out.println(i));

        // No Exception
        // origStream.forEach(i -> System.out.println(i));
        // savedForLaterStream.forEach(i -> System.out.println(i));
    }

    private static Integer addOne(Integer i) {
        return new Integer(i + 1);
    }

    private static boolean saveThirtyForLater(Integer i) {
        if (i == 30) {
            savedForLater.add(i);
            return false;
        }
        return true;
    }
}

This code produces the following result:

10
20
40
50
Exception in thread "main" java.util.ConcurrentModificationException
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1380)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502)
    at java.util.stream.StreamSpliterators$WrappingSpliterator.forEachRemaining(StreamSpliterators.java:312)
    at java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:742)
    at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580)
    at test.TestStreams.main(TestStreams.java:17)

I have tried using a threadsafe list, but it doesn't produce the desired result either.

Per JavaDoc Stream.concat Creates a lazily concatenated stream whose elements are all the elements of the first stream followed by all the elements of the second stream.

The concat on the streams should not invoke the List's stream until it pulls an object from it, at which point the list isn't changing.

If all else fails I could read the file twice, but I would really like to know why this doesn't work, and if anyone has an alternate idea about manipulating the stream to avoid a second pass.

Streams are lazy. Unless you use terminal operation such as forEach or collect , intermediate operations (such as filter or map ) will not be executed.

Stream<Integer> origStream = Stream.of(10, 20, 30, 40, 50).filter(
        i -> saveThirtyForLater(i));

After executing above line of code, your savedForLater list remains unchanged. It will be modified only after you use terminal operation on this stream.

In your final expression Stream.concat(origStream, savedForLaterStream).forEach(i -> System.out.println(i)); you use terminal operation forEach on both streams origStream and savedForLaterStream . The former stream will modify savedForLater list while the latter actually iterates over it - this is the reason why you get ConcurrentModificationException .

Modifying a field in filter method is a very bad approach and it actually violates the contract of the filter method. From its javadoc:

predicate - a non-interfering, stateless predicate to apply to each element to determine if it should be included

Your predicate saveThirtyForLater is not stateless as it modifies savedForLater list.

Solution:

Instead of using concat , you can process these streams separately, one after the other:

origStream.forEach(i -> System.out.println(i));
savedForLaterStream.forEach(i -> System.out.println(i));

These yields desired result:

10
20
40
50
31

You cannot do the trick with concat as it breaks late-binding. It requests the sizes of both streams immediately when invoked, so you should know in advance how many elements will be saved for later. However it's possible to do this with flatMap , thanks to late-binding:

public static void main(String[] args) {
    Stream<Integer> origStream = Stream.of(10, 20, 30, 40, 50).filter(
            i -> saveThirtyForLater(i));
    Stream<Integer> savedForLaterStream = savedForLater.stream().map(
            i -> addOne(i));

    Stream.of(origStream, savedForLaterStream)
        .flatMap(Function.identity())
        .forEach(
        i -> System.out.println(i));
}

This code works nicely and prints 10 / 20 / 40 / 50 / 31 . Though it will work unpredictably if you parallelize it.

Note that my solution heavily relies on current implementation of Stream API in OpenJDK/OracleJDK. Stream API specification explicitly says that the predicate used in filter must be stateless and non-interfering. As these properties are violated here, the result, by specification, is unpredictable.

I appreciate the help from others, but did want to post my ultimate solution.

I used a LinkedBlockingQueue and a custom Spliterator instead of an ArrayList. Calling Stream.concat immediately generates the Spliterators of the argument streams (arguably unnecessarily). The ArrayListSpliterator is not tolerant to modification of the list once it has been generated as pointed out by others.

The LinkedBlockingQueue by default has a weakly consistent spliterator, which may return items added to the underlying queue after the initialization of the spliterator. In my tests, it consistently did, however, to avoid any chance of differing production behavior, I provided a custom spliterator which will return items added to the underlying queue after its initialization. The QSpliterator code was copied from: https://codereview.stackexchange.com/a/105308

package test;

import java.util.ArrayList;
import java.util.List;
import java.util.Spliterator;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.function.Consumer;
import java.util.stream.Stream;
import java.util.stream.StreamSupport;

public class TestStreams {
    private static LinkedBlockingQueue<Integer> savedForLater = new LinkedBlockingQueue<>();

    public static void main(String[] args) {
        Stream<Integer> origStream = Stream.of(10, 20, 30, 40, 50).filter(
                i -> saveThirtyForLater(i));
        Spliterator<Integer> qSpliterator = new QSpliterator<>(savedForLater);
        Stream<Integer> savedForLaterStream = StreamSupport.stream(
                qSpliterator, false).map(i -> addOne(i));

        Stream.concat(origStream, savedForLaterStream).forEach(
                i -> System.out.println(i));
    }

    private static Integer addOne(Integer i) {
        return new Integer(i + 1);
    }

    private static boolean saveThirtyForLater(Integer i) {
        if (i == 30) {
            savedForLater.add(i);
            return false;
        }
        return true;
    }

    private static final class QSpliterator<T> implements Spliterator<T> {

        private final BlockingQueue<T> queue;

        public QSpliterator(BlockingQueue<T> queue) {
            this.queue = queue;
        }

        @Override
        public boolean tryAdvance(Consumer<? super T> action) {
            try {
                action.accept(queue.take());
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                throw new IllegalStateException("Take interrupted.", e);
            }
            return true;
        }

        @Override
        public Spliterator<T> trySplit() {
            try {
                final int size = queue.size();
                List<T> vals = new ArrayList<>(size + 1);
                vals.add(queue.take());
                queue.drainTo(vals);
                return vals.spliterator();
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                throw new IllegalStateException(
                        "Thread interrupted during trySplit.", e);
            }
        }

        @Override
        public long estimateSize() {
            return Long.MAX_VALUE;
        }

        @Override
        public int characteristics() {
            return Spliterator.CONCURRENT;
        }

    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM