简体   繁体   中英

Spliterator skipping portions of text

I am facing a problem with streams' dropWhile or takeWhile methods due to which spliterator is skipping portions of text in a specific pattern odd or even. What should be done to process all portions of text? My methods here:

void read(Path filePath) {
    try {
        Stream<String> lines = Files.lines(filePath);
        while (true) {
            Spliterator<String> spliterator = lines.dropWhile(line -> !line.startsWith("FAYSAL:")).spliterator();
            Stream<String> portion = fetchNextPortion(spliterator);
            if(spliterator.estimateSize() == 0)
                break;
            portion .forEach(System.out::println);
            lines = StreamSupport.stream(spliterator, false);
        }
        lines.close();
    }
    catch (IOException e) {
        e.printStackTrace();
    }
}

private Stream<String> fetchNextPortion(Spliterator<String> spliterator) {
    return StreamSupport.stream(spliterator, false)
            .filter(this::isValidReportName)
            .peek(System.out::println)
            .findFirst()
            .map( first -> Stream.concat(Stream.of(first),
                    StreamSupport.stream(spliterator, false).takeWhile(line -> !line.startsWith("FAYSAL:")))).orElse(Stream.empty());
}

Sample input is:

FAYSAL: 1
Some text here
Some text here
FAYSAL: 2
Some text here
Some text here
FAYSAL: 3
Some text here
Some text here
FAYSAL: 4
Some text here
Some text here

It will skip FAYSAL: 2 and FAYSAL: 4

What should be done to process all portions of text?

You could choose a different approach.

Your code produced a StackOverflowError on my machine (also there is a call to fetchNextChunk but a method called fetchNextPartition , so I wasn't sure about that either) after displaying your problem, so instead of trying to debug it, I came up with a different way of splitting the input. Given that my approach contains the whole String in memory, it might not be suitable for larger files. I might work out a version with Streams later.

Base assumption: You want to split your input text into portions, each portion starting with a string that starts with "FAYSAL:".

The idea is similar to your approach but not based on Spliterators and it doesn't use dropWhile either. Instead it finds the first string starting with "FAYSAL:" (I assumed that that was what isValidReportName did; the code for the method wasn't in the question) and takes everything just up to the next portion start. Adding the found first element as first element of the list, the collection is then added to a list that can be later used. The amount of lines collected is then removed from the original list.

Full code:

import java.util.*;
import java.util.stream.Collectors;

class Main {

    public static void main(String[] args) {
        Main m = new Main();
        System.out.println(m.partitionTextByStringStart(m.getString()));
    }

    private List<List<String>> partitionTextByStringStart(String text) {
        List<List<String>> partitions = new ArrayList<>();
        List<String> lines = Arrays.asList(text.split("\n"));

        while (!lines.isEmpty()) {
            String first = lines.stream().filter(this::isValidReportName).findFirst().orElse("This is prolly bad");
            List<String> part = lines.stream().skip(1).takeWhile(l -> !l.startsWith("FAYSAL:")).collect(Collectors.toList());
            part.add(0, first);

            partitions.add(part);
            lines = lines.subList(part.size(), lines.size());
        }

        return partitions;
    }

    private boolean isValidReportName(String x) {
        return x.startsWith("FAYSAL:");
    }

    private String getString() {
        return "FAYSAL: 1\n" +
                "Some text here1\n" +
                "Some text here1\n" +
                "FAYSAL: 2\n" +
                "Some text here2\n" +
                "Some text here2\n" +
                "FAYSAL: 3\n" +
                "Some text here3\n" +
                "Some text here3\n" +
                "FAYSAL: 4\n" +
                "Some text here4\n" +
                "Some text here4";
    }

}

(Note: I used a static string here instead of file reading to make a full code example; you can adapt your code accordingly)

EDIT: After some research I found that grouping the things in a stream is surprisingly easy with a library called StreamEx ( Github ) ( Maven ). In this answer I found a note about the StreamEx#groupRuns function which does exactly that:

private Stream<Stream<String>> partitionStreamByStringStart(Stream<String> lineStream) {
    return StreamEx.of(lineStream).groupRuns((l1, l2) -> !l2.startsWith("FAYSAL:")).map(Collection::stream);
}

To see it working, you can add

System.out.println(m.partitionStreamByStringStart(m.getStream()).map(
    s -> s.collect(Collectors.toList())
).collect(Collectors.toList()));

to the main function and

private Stream<String> getStream() {
    return Stream.of(getString().split("\n"));
}

somewhere in the Main class of the above full code example.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM