[英]how do i count occurrence of words in a line
I am fairly new to java. 我是java的新手。 I want to count the occurrences of words in a particular line. 我想计算特定行中单词的出现次数。 So far i can only count the words but no idea how to count occurrences. 到目前为止,我只能统计单词,却不知道如何统计出现次数。
Is there a simple way to do this? 有没有简单的方法可以做到这一点?
Scanner file = new Scanner(new FileInputStream("/../output.txt"));
int count = 0;
while (file.hasNextLine()) {
String s = file.nextLine();
count++;
if(s.contains("#AVFC")){
System.out.printf("There are %d words on this line ", s.split("\\s").length-1);
System.out.println(count);
}
}
file.close();
Output: 输出:
There are 4 words on this line 1
There are 8 words on this line 13
There are 3 words on this line 16
Simplest way I can think of is to use String.split("\\\\s")
, which will split based on spaces. 我能想到的最简单的方法是使用String.split("\\\\s")
,它将基于空格进行拆分。
Then have a HashMap
containing a word as the key with the value being the number of times it is used. 然后使用一个HashMap
其中包含一个单词作为键,其值是使用该单词的次数。
HashMap<String, Integer> mapOfWords = new HashMap<String, Integer>();
while (file.hasNextLine()) {
String s = file.nextLine();
String[] words = s.split("\\s");
int count;
for (String word : words) {
if (mapOfWords.get(word) == null) {
mapOfWords.put(word, 1);
}
else {
count = mapOfWord.get(word);
mapOfWords.put(word, count + 1);
}
}
}
Implementation you requested to skip strings that contain certain words 您请求跳过包含某些单词的字符串的实现
HashMap<String, Integer> mapOfWords = new HashMap<String, Integer>();
while (file.hasNextLine()) {
String s = file.nextLine();
String[] words = s.split("\\s");
int count;
if (isStringWanted(s) == false) {
continue;
}
for (String word : words) {
if (mapOfWords.get(word) == null) {
mapOfWords.put(word, 1);
}
else {
count = mapOfWord.get(word);
mapOfWords.put(word, count + 1);
}
}
}
private boolean isStringWanted(String s) {
String[] checkStrings = new String[] {"chelsea", "Liverpool", "#LFC"};
for (String check : checkString) {
if (s.contains(check)) {
return false;
}
}
return true;
}
Try below code, it may solve your problem, in addition you can call String.toLowerCase() before you put it into the hashmap 尝试下面的代码,它可能会解决您的问题,此外,您可以在将其放入哈希图中之前调用String.toLowerCase()
String line ="a a b b b b a q c c";
...
Map<String,Integer> map = new HashMap<String,Integer>();
Scanner scanner = new Scanner(line);
while (scanner.hasNext()) {
String s = scanner.next();
Integer count = map.put(s,1);
if(count!=null) map.put(s,count + 1);
}
...
System.out.println(map);
Result: 结果:
{b=4, c=2, q=1, a=3}
Check Guava's Multiset . 检查番石榴的Multiset 。 Their description starts with 'The traditional Java idiom for eg counting how many times a word occurs in a document is something like:'
. 他们的描述始于'The traditional Java idiom for eg counting how many times a word occurs in a document is something like:'
。 You find some code snippets how to do that without a MultiSet. 您会找到一些代码片段,而不使用MultiSet怎么做。
BTW: If you only wanted to count the number of words in your string, why not just count the spaces? 顺便说一句:如果您只想计算字符串中的单词数,为什么不只计算空格呢? You could use StringUtils from the apache commons. 您可以使用来自Apache Commons的StringUtils 。 It's much better than creating an array of the split parts. 这比创建拆分部分的数组要好得多。 Also have a look at their implementation . 也看看它们的实现 。
int count = StringUtils.countMatches(string, " ");
最快的方法是将拆分后的数据存储在ArrayList中,然后在ArrayList上进行迭代并使用[Collections.frequency]( http://www.tutorialspoint.com/java/util/collections_frequency.htm )
In a given String
, occurrences of a given String
can be counted using String#indexOf(String, int)
and through a loop 在给定的String
,一个给定的出现String
可以使用计数String#indexOf(String, int)
和通过一个环路
String haystack = "This is a string";
String needle = "i";
int index = 0;
while (index != -1) {
index = haystack.indexOf(needle, index + 1);
if (index != -1) {
System.out.println(String.format("Found %s in %s at index %s.", needle, haystack, index));
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.