简体   繁体   English

使用Java Streams从文本文件一次读取X行?

[英]Read X lines at a time from a text file using Java Streams?

I have a "plain old text file" where lines end with a new line character. 我有一个“普通的旧文本文件”,其中行以新行字符结尾。 For arbitrary reasons I need to read and parse this text file 4 (X for generality) lines at a time. 出于任意原因,我需要一次读取和解析此文本文件4(X为通用)行。

I'd like to use Java streams for this task and I know I can turn the file into a stream like so: 我想将Java流用于此任务,我知道我可以将文件转换为如下所示的流:

try (Stream<String> stream = Files.lines(Paths.get("file.txt""))) {
    stream.forEach(System.out::println);
} catch (IOException e) {
    e.printStackTrace();
}

But how can I use Java's Stream API to "bunch" the file into groups of 4 consecutive lines? 但是,我如何使用Java的Stream API将文件“捆绑”成4个连续的行?

This is a job for java.util.Scanner . 这是java.util.Scanner的工作。 In Java 9, you can simply use 在Java 9中,您可以简单地使用

try(Scanner s = new Scanner(PATH)) {
    s.findAll("(.*\\R){1,4}")
     .map(mr -> Arrays.asList(mr.group().split("\\R")))
     .forEach(System.out::println);
}

For Java 8, you can use the back-port of findAll of this answer . 对于Java 8,您可以使用此答案findAll的后端口。 After adding an import static for that method, you can use it like 为该方法添加import static后,您可以像使用它一样使用它

try(Scanner s = new Scanner(PATH)) {
    findAll(s, Pattern.compile("(.*\\R){1,4}"))
        .map(mr -> Arrays.asList(mr.group().split("\\R")))
        .forEach(System.out::println);
}

Note that the result of the match operation is a single string containing up to four lines (less for the last line(s)). 请注意,匹配操作的结果是包含最多四行的单个字符串(最后一行较少)。 If that's suitable for your follow-up operation, you can skip splitting that string into individual lines. 如果这适合您的后续操作,您可以跳过将该字符串拆分为单独的行。

You may even use the MatchResult 's properties for a more sophisticated processing of the chunks, eg 您甚至可以使用MatchResult的属性来更复杂地处理块,例如

try(Scanner s = new Scanner(PATH)) {
    findAll(s, Pattern.compile("(.*)\\R(?:(.*)\\R)?(?:(.*)\\R)?(?:(.*)\\R)?"))
        .flatMap(mr -> IntStream.rangeClosed(1, 4)
                           .mapToObj(ix -> mr.group(ix)==null? null: ix+": "+mr.group(ix)))
        .filter(Objects::nonNull)
        .forEach(System.out::println);
}

There is a way to partition and process your file content into n -size chunks using standard Java 8 Stream API. 有一种方法可以使用标准Java 8 Stream API将文件内容分区并处理为n -size块。 You can use Collectors.groupingBy() to partition your file content into chunks - you can collect them as a Collection<List<String>> or you can apply some processing while collecting all lines (eg you can join them to a single String). 您可以使用Collectors.groupingBy()将文件内容分区为块 - 您可以将它们收集为Collection<List<String>> ,也可以在收集所有行时应用一些处理(例如,您可以将它们连接到单个String) 。

Take a look at following example: 看一下下面的例子:

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Collection;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Collectors;

public class ReadFileWithStream {

    public static void main(String[] args) throws IOException {
        // Path to a file to read
        final Path path = Paths.get(ReadFileWithStream.class.getResource("/input.txt")‌​.toURI());
        final AtomicInteger counter = new AtomicInteger(0);
        // Size of a chunk
        final int size = 4;

        final Collection<List<String>> partitioned = Files.lines(path)
                .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / size))
                .values();

        partitioned.forEach(System.out::println);
    }
}

My input file contains some numbers (one number at a line) , and when I run following code I get something like: 我的输入文件包含一些数字(一行中有一个数字) ,当我运行以下代码时,我会得到类似的结果:

[0, 0, 0, 2]
[0, -3, 2, 0]
[1, -3, -8, 0]
[2, -12, -11, -11]
[-8, -1, -8, 0]
[2, -1, 2, -1]
... and so on

Collectors.groupingBy() allows me also to use different downstream collector. Collectors.groupingBy()允许我使用不同的下游收集器。 By default Collectors.toList() is being used so my result is accumulated into a List<String> and I get Collection<List<String>> as a final result. 默认使用Collectors.toList() ,因此我的结果被累积到List<String> ,我得到Collection<List<String>>作为最终结果。

Let's say I want to read 4-size chunks and I want to sum all numbers in a chunk. 假设我想要读取4个大小的块,我想将所有数字加在一个块中。 In this case I will use Collectors.summingInt() as my downstream function and the returned result is Collection<Integer> : 在这种情况下,我将使用Collectors.summingInt()作为我的下游函数,返回的结果是Collection<Integer>

final Collection<Integer> partitioned = Files.lines(path)
        .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / size, Collectors.summingInt(Integer::valueOf)))
        .values();

Output: 输出:

2
-1
-10
-32
-17
2
-11
-49
... and so on

And last but not least. 最后但并非最不重要。 Collectors.groupingBy() returns a map where values are grouped by specific keys. Collectors.groupingBy()返回一个映射,其中值按特定键分组。 That's why in the end we call Map.values() to get a collection of the values this contained in this map. 这就是为什么最后我们调用Map.values()来获取此映射中包含的值的集合。

Hope it helps. 希望能帮助到你。

Here's a straightforward way using Guava's Iterators.partition method: 这是使用Guava的Iterators.partition方法的简单方法:

try (Stream<String> stream = Files.lines(Paths.get("file.txt""))) {

    Iterator<List<String>> iterator = Iterators.partition(stream.iterator(), 4);

    // iterator.next() returns each chunk as a List<String>

} catch (IOException e) {
    // handle exception properly
}

This is only suitable for sequential processing, but if you are reading a file from disk, I can hardly imagine any benefit from parallel processing... 这仅适用于顺序处理,但如果您从磁盘读取文件,我很难想象并行处理会带来什么好处......


EDIT: If you want, instead of working with the iterator, you could convert it again to a stream: 编辑:如果你想,而不是使用迭代器,你可以再次将其转换为流:

Stream<List<String>> targetStream = StreamSupport.stream(
      Spliterators.spliteratorUnknownSize(iterator, Spliterator.ORDERED),
      false);

If you want to stick with the streams, the only solution I see is to write your own custom collector. 如果你想坚持使用流,我看到的唯一解决方案是编写自己的自定义收集器。 It's not intended for that purpose, but you can make use of it. 它并非用于此目的,但您可以使用它。

private static final class CustomCollector {

    private List<String> list = new ArrayList<>();

    private List<String> acumulateList = new ArrayList<>();

    public void accept(String str) {
        acumulateList.add(str);
        if (acumulateList.size() == 4) { // acumulate 4 strings
            String collect = String.join("", acumulateList);
            // I just joined them in on string, you can do whatever you want
            list.add(collect);
            acumulateList = new ArrayList<>();
        }
    }

    public CustomCollector combine(CustomCollector other) {
        throw new UnsupportedOperationException("Parallel Stream not supported");
    }

    public List<String> finish() {
        if(!acumulateList.isEmpty()) {
            list.add(String.join("", acumulateList));
        }
        return list;
    }

    public static Collector<String, ?, List<String>> collector() {
        return Collector.of(CustomCollector::new, CustomCollector::accept, CustomCollector::combine, CustomCollector::finish);
    }
}

And use it like so : 并像这样使用它:

stream.collect(CustomCollector.collector());

If you're open to using RxJava , you could use its buffer capability: 如果您愿意使用RxJava ,则可以使用其buffer功能:

Stream<String> stream = Files.lines(Paths.get("file.txt"))

Observable.fromIterable(stream::iterator)
          .buffer(4)                      // Observable<List<String>>
          .map(x -> String.join(", ", x)) // Observable<String>
          .forEach(System.out::println);

buffer creates an Observable that collects elements in lists of a certain size. buffer创建一个Observable ,用于收集特定大小的列表中的元素。 In the above example, I added another transformation via map to make the list more print-friendly, but you can transform the Observable as you see fit. 在上面的例子中,我通过map添加了另一个转换,使列表更加友好,但您可以根据需要转换Observable For example, if you had a method processChunk that took as an argument a List<String> and returned a String , you could do: 例如,如果您有一个方法processChunk ,它将List<String>作为参数并返回一个String ,您可以执行以下操作:

Observable<String> fileObs =
    Observable.fromIterable(stream::iterator)
              .buffer(4)
              .map(x -> processChunk(x));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从Java文本文件中一次读取x行文本 - Reading x lines of text at a time from a text file in Java 如何使用Java从文本文件读取奇数行? - How to read odd number of lines from a text file using Java? 如何通过使用Java中的递归和流从文件输出字符串行 - How to output string lines from a file by using recursion and streams in Java 在Java中一次读取两行文本文件的最佳方法是什么? - What is the best way to read a text file two lines at a time in Java? 如何使用Java 8流按块读取文本文件 - How to read text file by block with Java 8 streams 使用流从 Java 中的 csv-File 中按日期和时间排序 - Sorting by date and time from csv-File in Java using streams 从Java中的文本文件读取时间 - read time from text file in java 在从文本文件读取字符串行之前,在Java中使用BufferedReader读取一行int - Using BufferedReader in java to read a line of ints before reading lines of Strings from a text file Java使用Scanner类从文本文件中读取每行数字,找到最大值 - Java read each lines of numbers from text file using Scanner class, finding max value 如何读取以string开头的行:从文本文件中输入Err并使用JAVA将其邮寄给收件人 - how to read lines start with string : Err from a text file and mail it to the recipients using JAVA
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM