[英]How to compare String Array and count similar words
我一直在尝试获取此代码,但我仍然不能。 该代码段是我能做的最接近的段。 我想念什么? 我正在尝试不使用哈希的代码。
// Read all the words from the dictionary (text.txt) into an array
BufferedReader br = new BufferedReader(new FileReader("text.txt"));
int bufferLength = 1000000;
char[] buffer = new char[bufferLength];
int charsRead = br.read(buffer, 0, bufferLength);
br.close();
String text = new String(buffer);
text = text.trim();
text = text.toLowerCase();
String[] words = text.split("\n");
System.out.println("Total number of words in text: " + words.length);
//Find unique words:
String[] uniqueText = words;
int[] uniqueTextCount = new int[uniqueText.length];
for (int i = 0; i < words.length; i++) {
for (int j = 0; j < uniqueText.length; j++) {
if (words[i].equals(uniqueText[j])) {
uniqueTextCount[j]++;
} else {
uniqueText[i] = words[i];
}
}
System.out.println(uniqueText[i] + " for " + uniqueTextCount[i]);
}
}
根据您的原始代码,我假设:
text.txt
每行包含一个单词。 也许第一件事是BufferedReader
允许逐行读取 :
for (String line; (line = br.nextLine()) != null; ) {
// Process each line, which in this case is a word.
}
最好逐行处理而不是读取整个文件,因为您的程序将需要使用更多的内存(与文件大小一样大),而您可以减少使用的内存。
现在,如果我们考虑需求,则期望的输出是从不同的单词到其数量的映射。 这应该在上面的for
循环之前。
// A HashMap would also work, but you have specified that you do not want
// to use hashing.
Map<String, Integer> distinctWordCounts = new TreeMap<>();
这样初始化后,在循环的每次迭代中(即,对于我们遇到的每一行),我们都可以执行以下操作:
if (distinctWordCounts.hasKey(line)) {
// We have seen this word. Increment the count we've seen it.
distinctWordCounts.put(line, distinctWordCounts.get(line) + 1);
} else {
// We have never seen this word. Set the count seen to 1.
distinctWordCounts.put(line, 1);
}
上面的代码比理想的代码要多一些开销,因为if
案例涉及到三个遍历,而我们可以避免一次遍历。 但这可能是另一天的故事,除非您有理由关注非渐近速度的提高。
最终,我们可以遍历distinctWordCounts
作为单词数
for (Entry<String, Integer> entry : distinctWordCounts.entrySet()) {
System.out.println(entry.getKey() + " occurs " + entry.getValue() + "times.");
}
听起来您只是想计算每个单词的不重复出现次数? 如果是这样,您可以执行以下操作:
String[] array = {"a", "a", "b", "c", "c", "c", "d", "e", "f", "f"};
Map<String, Long> map = new HashMap<>();
Stream.of(array)
.distinct()
.forEach(s -> map.put(s,
Stream.of(array)
.filter(s::equals)
.count()));
如果您只想要独特的单词:
String[] unique = Stream.of(array)
.distinct()
.toArray(String[]::new);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.