[英]Reading chunks of a text file with a Java 8 Stream
Java 8有一種從文件行創建Stream的方法。 在這種情況下,foreach將逐步執行。 我有一個以下格式的文本文件。
bunch of lines with text
$$$$
bunch of lines with text
$$$$
我需要將在$$$$
之前的每一行都放入Stream中的單個元素中。
換句話說,我需要一個字符串流。 每個字符串包含$$$$
之前的內容。
什么是最佳方法(以最小的開銷)?
我無法提出一種懶惰地處理線條的解決方案。 我不確定這是否可行。
我的解決方案產生一個ArrayList
。 如果必須使用Stream
,只需在其上調用stream()
。
public class DelimitedFile {
public static void main(String[] args) throws IOException {
List<String> lines = lines(Paths.get("delimited.txt"), "$$$$");
for (int i = 0; i < lines.size(); i++) {
System.out.printf("%d:%n%s%n", i, lines.get(i));
}
}
public static List<String> lines(Path path, String delimiter) throws IOException {
return Files.lines(path)
.collect(ArrayList::new, new BiConsumer<ArrayList<String>, String>() {
boolean add = true;
@Override
public void accept(ArrayList<String> lines, String line) {
if (delimiter.equals(line)) {
add = true;
} else {
if (add) {
lines.add(line);
add = false;
} else {
int i = lines.size() - 1;
lines.set(i, lines.get(i) + '\n' + line);
}
}
}
}, ArrayList::addAll);
}
}
檔案內容:
bunch of lines with text bunch of lines with text2 bunch of lines with text3 $$$$ 2bunch of lines with text 2bunch of lines with text2 $$$$ 3bunch of lines with text 3bunch of lines with text2 3bunch of lines with text3 3bunch of lines with text4 $$$$
輸出:
0: bunch of lines with text bunch of lines with text2 bunch of lines with text3 1: 2bunch of lines with text 2bunch of lines with text2 2: 3bunch of lines with text 3bunch of lines with text2 3bunch of lines with text3 3bunch of lines with text4
編輯:
我終於想出了一個延遲生成Stream
的解決方案:
public static Stream<String> lines(Path path, String delimiter) throws IOException {
Stream<String> lines = Files.lines(path);
Iterator<String> iterator = lines.iterator();
return StreamSupport.stream(Spliterators.spliteratorUnknownSize(new Iterator<String>() {
String nextLine;
@Override
public boolean hasNext() {
if (nextLine != null) {
return true;
}
while (iterator.hasNext()) {
String line = iterator.next();
if (!delimiter.equals(line)) {
nextLine = line;
return true;
}
}
lines.close();
return false;
}
@Override
public String next() {
if (!hasNext()) {
throw new NoSuchElementException();
}
StringBuilder sb = new StringBuilder(nextLine);
nextLine = null;
while (iterator.hasNext()) {
String line = iterator.next();
if (delimiter.equals(line)) {
break;
}
sb.append('\n').append(line);
}
return sb.toString();
}
}, Spliterator.ORDERED | Spliterator.NONNULL | Spliterator.IMMUTABLE), false);
}
實際上/巧合的是,它與BufferedReader.lines()
的實現BufferedReader.lines()
由Files.lines(Path)
內部使用BufferedReader.lines()
非常相似。 不使用這兩種方法,而直接使用Files.newBufferedReader(Path)
和BufferedReader.readLine()
可能會減少開銷。
你可以試試
List<String> list = new ArrayList<>();
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
list = stream
.filter(line -> !line.equals("$$$$"))
.collect(Collectors.toList());
} catch (IOException e) {
e.printStackTrace();
}
已經存在一個類似的簡短答案,但是以下是type.safe,沒有附加狀態:
Path path = Paths.get("... .txt");
try {
List<StringBuilder> glist = Files.lines(path, StandardCharsets.UTF_8)
.collect(() -> new ArrayList<StringBuilder>(),
(list, line) -> {
if (list.isEmpty() || list.get(list.size() - 1).toString().endsWith("$$$$\n")) {
list.add(new StringBuilder());
}
list.get(list.size() - 1).append(line).append('\n');
},
(list1, list2) -> {
if (!list1.isEmpty() && !list1.get(list1.size() - 1).toString().endsWith("$$$$\n")
&& !list2.isEmpty()) {
// Merge last of list1 and first of list2:
list1.get(list1.size() - 1).append(list2.remove(0).toString());
}
list1.addAll(list2);
});
glist.forEach(sb -> System.out.printf("------------------%n%s%n", sb));
} catch (IOException ex) {
Logger.getLogger(App.class.getName()).log(Level.SEVERE, null, ex);
}
而不是.endsWith("$$$$\\n")
,這樣做會更好:
.matches("(^|\n)\\$\\$\\$\\$\n")
這是基於先前工作的解決方案:
public class ChunkSpliterator extends Spliterators.AbstractSpliterator<List<String>> {
private final Spliterator<String> source;
private final Predicate<String> delimiter;
private final Consumer<String> getChunk;
private List<String> current;
ChunkSpliterator(Spliterator<String> lineSpliterator, Predicate<String> mark) {
super(lineSpliterator.estimateSize(), ORDERED|NONNULL);
source=lineSpliterator;
delimiter=mark;
getChunk=s -> {
if(current==null) current=new ArrayList<>();
current.add(s);
};
}
public boolean tryAdvance(Consumer<? super List<String>> action) {
while(current==null || !delimiter.test(current.get(current.size()-1)))
if(!source.tryAdvance(getChunk)) return lastChunk(action);
current.remove(current.size()-1);
action.accept(current);
current=null;
return true;
}
private boolean lastChunk(Consumer<? super List<String>> action) {
if(current==null) return false;
action.accept(current);
current=null;
return true;
}
public static Stream<List<String>> toChunks(
Stream<String> lines, Predicate<String> splitAt, boolean parallel) {
return StreamSupport.stream(
new ChunkSpliterator(lines.spliterator(), splitAt),
parallel);
}
}
你可以像這樣使用
try(Stream<String> lines=Files.lines(pathToYourFile)) {
ChunkSpliterator.toChunks(
lines,
Pattern.compile("^\\Q$$$$\\E$").asPredicate(),
false)
/* chain your stream operations, e.g.
.forEach(s -> { s.forEach(System.out::print); System.out.println(); })
*/;
}
您可以將Scanner
用作迭代器,並從中創建流:
private static Stream<String> recordStreamOf(Readable source) {
Scanner scanner = new Scanner(source);
scanner.useDelimiter("$$$$");
return StreamSupport
.stream(Spliterators.spliteratorUnknownSize(scanner, Spliterator.ORDERED | Spliterator.NONNULL), false)
.onClose(scanner::close);
}
這會將換行符保留在塊中,以進行進一步的過濾或拆分。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.