[英]How to count words in a text file, java 8-style
我正在嘗試執行一個分配,首先計算目錄中的文件數,然后給出每個文件中的字數。 我的文件計數沒問題,但是我很難將老師給我的一些代碼從進行頻率計數的課程轉換為更簡單的字數計數。 此外,我似乎無法找到正確的代碼來查看每個文件以計算單詞(我試圖找到“通用”而不是特定的東西,但我嘗試使用特定的文本文件測試程序) . 這是預期的輸出:
Count 11 files:
word length: 1 ==> 80
word length: 2 ==> 321
word length: 3 ==> 643
但是,這是輸出的內容:
primes.txt
but
are
sometimes
sense
refrigerator
make
haiku
dont
they
funny
word length: 1 ==> {but=1, are=1, sometimes=1, sense=1, refrigerator=1, make=1, haiku=1, dont=1, they=1, funny=1}
.....
Count 11 files:
我正在使用兩個類:WordCount 和 FileCatch8
字數:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.AbstractMap.SimpleEntry;
import java.util.Arrays;
import java.util.Map;
import static java.util.stream.Collectors.counting;
import static java.util.stream.Collectors.groupingBy;
/**
*
* @author
*/
public class WordCount {
/**
*
* @param filename
* @return
* @throws java.io.IOException
*/
public Map<String, Long> count(String filename) throws IOException {
//Stream<String> lines = Files.lines(Paths.get(filename));
Path path = Paths.get("haiku.txt");
Map<String, Long> wordMap = Files.lines(path)
.parallel()
.flatMap(line -> Arrays.stream(line.trim().split(" ")))
.map(word -> word.replaceAll("[^a-zA-Z]", "").toLowerCase().trim())
.filter(word -> word.length() > 0)
.map(word -> new SimpleEntry<>(word, 1))
//.collect(Collectors.toMap(s -> s, s -> 1, Integer::sum));
.collect(groupingBy(SimpleEntry::getKey, counting()));
wordMap.forEach((k, v) -> System.out.println(String.format(k,v)));
return wordMap;
}
}
和 FileCatch:
import java.io.IOException;
import java.nio.file.DirectoryStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
/*
* To change this license header, choose License Headers in Project Properties.
* To change this template file, choose Tools | Templates
* and open the template in the editor.
*/
/**
*
* @author
*/
public class FileCatch8 {
public static void main(String args[]) {
List<String> fileNames = new ArrayList<>();
try {
DirectoryStream<Path> directoryStream = Files.newDirectoryStream
(Paths.get("files"));
int fileCounter = 0;
WordCount wordCnt = new WordCount();
for (Path path : directoryStream) {
System.out.println(path.getFileName());
fileCounter++;
fileNames.add(path.getFileName().toString());
System.out.println("word length: " + fileCounter + " ==> " +
wordCnt.count(path.getFileName().toString()));
}
} catch(IOException ex){
}
System.out.println("Count: "+fileNames.size()+ " files");
}
}
該程序使用帶有 lambda 語法的 Java 8 流
字數示例:
Files.lines(Paths.get(file))
.flatMap(line -> Arrays.stream(line.trim().split(" ")))
.map(word -> word.replaceAll("[^a-zA-Z]", "").toLowerCase().trim())
.filter(word -> !word.isEmpty())
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
文件數:
Files.walk(Paths.get(file), Integer.MAX_VALUE).count();
Files.walk(Paths.get(file)).count();
在我看來,使用 Java 8 計算文件中單詞的最簡單方法是:
Long wordsCount = Files.lines(Paths.get(file))
.flatMap(str->Stream.of(str.split("[ ,.!?\r\n]")))
.filter(s->s.length()>0).count();
System.out.println(wordsCount);
並計算所有文件:
Long filesCount = Files.walk(Paths.get(file)).count();
System.out.println(filesCount);
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.