简体   繁体   English

如何使用Java 8流按块读取文本文件

[英]How to read text file by block with Java 8 streams

I want to read an ASCII file that is composed of "blocks" that are delimited by start and end tags. 我想读取由“块”组成的ASCII文件,这些“块”由开始和结束标签定界。

I have never used Java 8 streams and I would like to test them on this file reader, but I don't really know how to do it. 我从未使用过Java 8流,并且想在此文件读取器上对其进行测试,但是我真的不知道该怎么做。

For the sake of simplicity, let's consider the following file format (actual file format can be found here ): 为了简单起见,让我们考虑以下文件格式( 可以在此处找到实际文件格式):

$Node
6
1 1.0 0.0 0.0
2 -1.0 0.0 0.0
3 0.0 1.0 0.0
4 0.0 -1.0 0.0
5 0.0 0.0 1.0
6 0.0 0.0 -1.0
$EndNode
$Elements
3
1 10 1 2 3
2 10 4 5 6
3 10 1 5 3
$EndElements

Where the first line of each block is the number of elements in the block. 每个块的第一行是该块中元素的数量。 Then each block is a list of space-separated values. 然后,每个块都是一个以空格分隔的值的列表。 Number of values and types vary depending on the block. 值和类型的数量因块而异。

In real life, the file can get pretty big (several hundred Mb, maybe up to a few Gb), so performance is critical. 在现实生活中,文件可能会变得很大(几百Mb,也许到几Gb),因此性能至关重要。

Using Java NIO 2 (without the Java 8 streams), I would have done something like this: 使用Java NIO 2(没有Java 8流),我会做这样的事情:

BufferedReader reader = Files.newBufferedReader(filePath, Charset.defaultCharset());
String line = null;
Parser currentParser = defaultParser;
while ((line = reader.readLine()) != null) {
    if (line.startsWith("$")) {
        currentParser = getParser(line);
        continue;
    }
    currentParser.parseLine(line);
}

With a line parser that would be smart enough to deal with the first line of the block differently that the rest (without having to test a isFirstLineOfBlock boolean for each line)... Don't know yet how to do that either by the way. 有了一个行解析器,它足够聪明,可以以不同的方式处理块的第一行(不必为每行测试一个isFirstLineOfBlock布尔值)...尚不知道该怎么做。

Anyway, I would appreciate some help with using Java 8 streams for this file reader. 无论如何,我希望为该文件阅读器使用Java 8流提供一些帮助。

Final question, what is the advantage of using Java streams for such an application: is it just a question of readability or can I expect improved performances? 最后一个问题,对于这样的应用程序使用Java流有什么好处:这仅仅是可读性问题,还是我期望性能得到改善?

You have several ways to do it with Java 8 streams. 您可以通过多种方式使用Java 8流。 For example, 例如,

try (BufferedReader br = Files.newBufferedReader(Paths.get(filePath, Charset.defaultCharset())) {
    br.lines()
        .filter(line -> !line.startsWith("$"))
        .forEachOrdered(currentParser::parseLine);
} catch (IOException ex) {
    throw new Error(ex);
}

.lines() method description contains .lines()方法描述包含

The Stream is lazily populated, i.e., 
read only occurs during the terminal stream operation.

In this example terminal operation is forEachOrdered 在此示例中,终端操作为forEachOrdered


another one 另一个

try (Stream<String> stream = Files.lines(Paths.get(filePath, Charset.defaultCharset())) {
    stream
        .filter(line -> !line.startsWith("$"))
        .forEachOrdered(currentParser::parseLine);
} catch (IOException ex) {
    throw new Error(ex);
}

It's possible to parse such constructs with the help of my free StreamEx library which enhances standard Stream API: 可以借助免费的StreamEx库来解析此类构造,该库增强了标准Stream API:

StreamEx.ofLines(filePath, Charset.defaultCharset())
        .groupRuns((a, b) -> !b.startsWith("$"))
        .forEachOrdered(list -> 
            list.subList(1, list.size()).forEach(getParser(list.get(0))::parseLine));

Here we use groupRuns method which combines single file entry into the list. 在这里,我们使用groupRuns方法,该方法将单个文件条目组合到列表中。 The argument passed to groupRuns is the BiPredicate which applies to the pair of adjacent input elements which should return true if the elements must be grouped. 传递给groupRuns的参数是BiPredicate ,它适用于一对相邻的输入元素,如果必须对元素进行分组,则应返回true。 Here we group elements unless the next one starts with "$" . 在这里,我们将元素分组,除非下一个元素以"$"开头。 After that we have lazily populated Stream<List<String>> and parse each group creating the parser using the first line and calling parseLine for all subsequent lines. 之后,我们懒惰地填充Stream<List<String>>并解析每个组,使用第一行创建解析器,并为所有后续行调用parseLine

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM