使用Java 8 Stream讀取文本文件的大塊

Question

Java 8有一種從文件行創建Stream的方法。 在這種情況下，foreach將逐步執行。 我有一個以下格式的文本文件。

bunch of lines with text
$$$$
bunch of lines with text
$$$$

我需要將在$$$$之前的每一行都放入Stream中的單個元素中。

換句話說，我需要一個字符串流。 每個字符串包含$$$$之前的內容。

什么是最佳方法（以最小的開銷）？

Answer 1

我無法提出一種懶惰地處理線條的解決方案。 我不確定這是否可行。

我的解決方案產生一個ArrayList 。 如果必須使用Stream ，只需在其上調用stream() 。

public class DelimitedFile {
    public static void main(String[] args) throws IOException {
        List<String> lines = lines(Paths.get("delimited.txt"), "$$$$");
        for (int i = 0; i < lines.size(); i++) {
            System.out.printf("%d:%n%s%n", i, lines.get(i));
        }
    }

    public static List<String> lines(Path path, String delimiter) throws IOException {
        return Files.lines(path)
                .collect(ArrayList::new, new BiConsumer<ArrayList<String>, String>() {
                    boolean add = true;

                    @Override
                    public void accept(ArrayList<String> lines, String line) {
                        if (delimiter.equals(line)) {
                            add = true;
                        } else {
                            if (add) {
                                lines.add(line);
                                add = false;
                            } else {
                                int i = lines.size() - 1;
                                lines.set(i, lines.get(i) + '\n' + line);
                            }
                        }
                    }
                }, ArrayList::addAll);
    }
}

檔案內容：

bunch of lines with text
bunch of lines with text2
bunch of lines with text3
$$$$
2bunch of lines with text
2bunch of lines with text2
$$$$
3bunch of lines with text
3bunch of lines with text2
3bunch of lines with text3
3bunch of lines with text4
$$$$

輸出：

0:
bunch of lines with text
bunch of lines with text2
bunch of lines with text3
1:
2bunch of lines with text
2bunch of lines with text2
2:
3bunch of lines with text
3bunch of lines with text2
3bunch of lines with text3
3bunch of lines with text4

編輯：

我終於想出了一個延遲生成Stream的解決方案：

public static Stream<String> lines(Path path, String delimiter) throws IOException {
    Stream<String> lines = Files.lines(path);
    Iterator<String> iterator = lines.iterator();
    return StreamSupport.stream(Spliterators.spliteratorUnknownSize(new Iterator<String>() {
        String nextLine;

        @Override
        public boolean hasNext() {
            if (nextLine != null) {
                return true;
            }
            while (iterator.hasNext()) {
                String line = iterator.next();
                if (!delimiter.equals(line)) {
                    nextLine = line;
                    return true;
                }
            }
            lines.close();
            return false;
        }

        @Override
        public String next() {
            if (!hasNext()) {
                throw new NoSuchElementException();
            }
            StringBuilder sb = new StringBuilder(nextLine);
            nextLine = null;
            while (iterator.hasNext()) {
                String line = iterator.next();
                if (delimiter.equals(line)) {
                    break;
                }
                sb.append('\n').append(line);
            }
            return sb.toString();
        }
    }, Spliterator.ORDERED | Spliterator.NONNULL | Spliterator.IMMUTABLE), false);
}

實際上/巧合的是，它與BufferedReader.lines()的實現BufferedReader.lines()由Files.lines(Path)內部使用BufferedReader.lines()非常相似。 不使用這兩種方法，而直接使用Files.newBufferedReader(Path)和BufferedReader.readLine()可能會減少開銷。

Answer 2

你可以試試

    List<String> list = new ArrayList<>();
    try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
            list = stream
                .filter(line -> !line.equals("$$$$"))
                .collect(Collectors.toList());
    } catch (IOException e) {
        e.printStackTrace();
    }

Answer 3

已經存在一個類似的簡短答案，但是以下是type.safe，沒有附加狀態：

    Path path = Paths.get("... .txt");
    try {
        List<StringBuilder> glist = Files.lines(path, StandardCharsets.UTF_8)
                .collect(() -> new ArrayList<StringBuilder>(),
                        (list, line) -> {
                            if (list.isEmpty() || list.get(list.size() - 1).toString().endsWith("$$$$\n")) {
                                list.add(new StringBuilder());
                            }
                            list.get(list.size() - 1).append(line).append('\n');
                        },
                        (list1, list2) -> {
                            if (!list1.isEmpty() && !list1.get(list1.size() - 1).toString().endsWith("$$$$\n")
                                    && !list2.isEmpty()) {
                                // Merge last of list1 and first of list2:
                                list1.get(list1.size() - 1).append(list2.remove(0).toString());
                            }
                            list1.addAll(list2);
                        });
        glist.forEach(sb -> System.out.printf("------------------%n%s%n", sb));
    } catch (IOException ex) {
        Logger.getLogger(App.class.getName()).log(Level.SEVERE, null, ex);
    }

而不是.endsWith("$$$$\\n") ，這樣做會更好：

.matches("(^|\n)\\$\\$\\$\\$\n")

Answer 4

這是基於先前工作的解決方案：

public class ChunkSpliterator extends Spliterators.AbstractSpliterator<List<String>> {
    private final Spliterator<String> source;
    private final Predicate<String> delimiter;
    private final Consumer<String> getChunk;
    private List<String> current;

    ChunkSpliterator(Spliterator<String> lineSpliterator, Predicate<String> mark) {
        super(lineSpliterator.estimateSize(), ORDERED|NONNULL);
        source=lineSpliterator;
        delimiter=mark;
        getChunk=s -> {
            if(current==null) current=new ArrayList<>();
            current.add(s);
        };
    }
    public boolean tryAdvance(Consumer<? super List<String>> action) {
        while(current==null || !delimiter.test(current.get(current.size()-1)))
            if(!source.tryAdvance(getChunk)) return lastChunk(action);
        current.remove(current.size()-1);
        action.accept(current);
        current=null;
        return true;
    }
    private boolean lastChunk(Consumer<? super List<String>> action) {
        if(current==null) return false;
        action.accept(current);
        current=null;
        return true;
    }

    public static Stream<List<String>> toChunks(
        Stream<String> lines, Predicate<String> splitAt, boolean parallel) {
        return StreamSupport.stream(
            new ChunkSpliterator(lines.spliterator(), splitAt),
            parallel);
    }
}

你可以像這樣使用

try(Stream<String> lines=Files.lines(pathToYourFile)) {
    ChunkSpliterator.toChunks(
        lines,
        Pattern.compile("^\\Q$$$$\\E$").asPredicate(),
        false)
    /* chain your stream operations, e.g.
    .forEach(s -> { s.forEach(System.out::print); System.out.println(); })
     */;
}

Answer 5

您可以將Scanner用作迭代器，並從中創建流：

private static Stream<String> recordStreamOf(Readable source) {
    Scanner scanner = new Scanner(source);
    scanner.useDelimiter("$$$$");
    return StreamSupport
        .stream(Spliterators.spliteratorUnknownSize(scanner, Spliterator.ORDERED | Spliterator.NONNULL), false)
        .onClose(scanner::close);
}

這會將換行符保留在塊中，以進行進一步的過濾或拆分。

使用Java 8 Stream讀取文本文件的大塊

問題描述

5 個解決方案

解決方案1
2 2016-10-10 08:50:16

解決方案2
0 2016-10-10 07:46:37

解決方案3
0 2016-10-10 09:53:03

解決方案4
0 2016-10-10 16:51:57

解決方案5
0 2017-05-15 22:18:55

使用Java 8 Stream讀取文本文件的大塊

問題描述

5 個解決方案

解決方案1 2 2016-10-10 08:50:16

解決方案2 0 2016-10-10 07:46:37

解決方案3 0 2016-10-10 09:53:03

解決方案4 0 2016-10-10 16:51:57

解決方案5 0 2017-05-15 22:18:55

解決方案1
2 2016-10-10 08:50:16

解決方案2
0 2016-10-10 07:46:37

解決方案3
0 2016-10-10 09:53:03

解決方案4
0 2016-10-10 16:51:57

解決方案5
0 2017-05-15 22:18:55