[英]How to read a text file into an array, count the number of occurrences and then display the count
[英]How to count the number of occurrences of words in a text
我正在开发一个项目,编写一个程序,找到文本中最常用的10个单词,但是我遇到了困难,不知道下一步该做什么。 有人能帮助我吗?
我只走到这一步:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Scanner;
import java.util.regex.Pattern;
public class Lab4 {
public static void main(String[] args) throws FileNotFoundException {
Scanner file = new Scanner(new File("text.txt")).useDelimiter("[^a-zA-Z]+");
List<String> words = new ArrayList<String>();
while (file.hasNext()){
String tx = file.next();
// String x = file.next().toLowerCase();
words.add(tx);
}
Collections.sort(words);
// System.out.println(words);
}
}
您可以使用Guava Multiset,这是一个字数统计示例: http : //code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained
以下是如何在Multiset中找到具有最高计数的单词: 以元素频率的顺序迭代 Multiset的最简单方法?
更新我在2012年写了这个答案。从那以后我们有了Java 8,现在有可能在没有外部库的情况下找到几行中最常用的10个单词:
List<String> words = ...
// map the words to their count
Map<String, Integer> frequencyMap = words.stream()
.collect(toMap(
s -> s, // key is the word
s -> 1, // value is 1
Integer::sum)); // merge function counts the identical words
// find the top 10
List<String> top10 = words.stream()
.sorted(comparing(frequencyMap::get).reversed()) // sort by descending frequency
.distinct() // take only unique values
.limit(10) // take only the first 10
.collect(toList()); // put it in a returned list
System.out.println("top10 = " + top10);
静态导入是:
import static java.util.Comparator.comparing;
import static java.util.stream.Collectors.toList;
import static java.util.stream.Collectors.toMap;
创建一个地图以跟踪这样的事件:
Scanner file = new Scanner(new File("text.txt")).useDelimiter("[^a-zA-Z]+");
HashMap<String, Integer> map = new HashMap<>();
while (file.hasNext()){
String word = file.next().toLowerCase();
if (map.containsKey(word)) {
map.put(word, map.get(word) + 1);
} else {
map.put(word, 0);
}
}
ArrayList<Map.Entry<String, Integer>> entries = new ArrayList<>(map.entrySet());
Collections.sort(entries, new Comparator<Map.Entry<String, Integer>>() {
@Override
public int compare(Map.Entry<String, Integer> a, Map.Entry<String, Integer> b) {
return a.getValue().compareTo(b.getValue());
}
});
for(int i = 0; i < 10; i++){
System.out.println(entries.get(entries.size() - i - 1).getKey());
}
package src;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Scanner;
import java.util.Map.Entry;
public class ScannerTest
{
public static void main(String[] args) throws FileNotFoundException
{
Scanner scanner = new Scanner(new File("G:/Script_nt.txt")).useDelimiter("[^a-zA-Z]+");
Map<String, Integer> map = new HashMap<String, Integer>();
while (scanner.hasNext())
{
String word = scanner.next();
if (map.containsKey(word))
{
map.put(word, map.get(word)+1);
}
else
{
map.put(word, 1);
}
}
List<Map.Entry<String, Integer>> entries = new ArrayList<Entry<String,Integer>>( map.entrySet());
Collections.sort(entries, new Comparator<Map.Entry<String, Integer>>() {
@Override
public int compare(Map.Entry<String, Integer> a, Map.Entry<String, Integer> b) {
return a.getValue().compareTo(b.getValue());
}
});
for(int i = 0; i < map.size(); i++){
System.out.println(entries.get(entries.size() - i - 1).getKey()+" "+entries.get(entries.size() - i - 1).getValue());
}
}
}
这是一个比lbalazscs更短的版本,它也使用Java 8的流API;
Arrays.stream(new String(Files.readAllBytes(PATH_TO_FILE), StandardCharsets.UTF_8).split("\\W+"))
.collect(Collectors.groupingBy(Function.<String>identity(), HashMap::new, counting()))
.entrySet()
.stream()
.sorted(((o1, o2) -> o2.getValue().compareTo(o1.getValue())))
.limit(10)
.forEach(System.out::println);
这将一气呵成:加载文件,按非单词字符分割,按字分组,并为每个组分配单词计数,然后为前十个单词打印带计数的单词。
有关非常类似设置的深入讨论,请参阅: https : //stackoverflow.com/a/33946927/327301
在输入中创建为文件或命令行中的字符串,并将其传递给下面的方法,它将返回一个包含单词作为键的映射,并将值作为它们在该句子或段落中的出现或计数。
public Map<String,Integer> getWordsWithCount(String sentances)
{
Map<String,Integer> wordsWithCount = new HashMap<String, Integer>();
String[] words = sentances.split(" ");
for (String word : words)
{
if(wordsWithCount.containsKey(word))
{
wordsWithCount.put(word, wordsWithCount.get(word)+1);
}
else
{
wordsWithCount.put(word, 1);
}
}
return wordsWithCount;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.