[英]Store occurences of words in a file and their count,using Scanner.( Java )
Here's the code: 这是代码:
Scanner scan = new Scanner(new FileReader ("C:\\mytext.txt"));
HashMap<String, Integer> listOfWords = new HashMap<String, Integer>();
while(scan.hasNextLine())
{
Scanner innerScan = new Scanner(scan.nextLine());
boolean wordExistence ;
while(wordExistence = innerScan.hasNext())
{
String word = innerScan.next();
int countWord = 0;
if(!listOfWords.containsKey(word)){ already
listOfWords.put(word, 1);
}else{
countWord = listOfWords.get(word) + 1;
listOfWords.remove(word);
listOfWords.put(word, countWord);
}
}
}
System.out.println(listOfWords.toString());
The problem is, my output contains words like : 问题是,我的输出包含像这样的词:
document.Because=1
document.This=1
space.=1
document.Because=1
document.This=1
space.=1
How do I handle this full stop's that are occuring?(And for further issues, I think any sentence terminator would be an issue, like question mark or exclamation mark). 我该如何处理正在发生的句号?(对于其他问题,我认为任何句子终止符都会成为问题,例如问号或感叹号)。
查看Scanner API
的类说明,特别是有关使用除空格之外的定界符的段落。
Scanner
uses any whitespace as the default delimiter. Scanner
使用任何空格作为默认定界符。 You can call useDelimiter()
of the Scanner instance and specify your own regexp to be used as delimiter. 您可以调用Scanner实例的
useDelimiter()
并指定您自己的正则表达式用作定界符。
Maybe you want to tinker with the following answer for speed optimization. 也许您想修改以下答案以优化速度。
final Pattern WORD = Pattern.compile("\\w+");
while(scan.hasNextLine())
{
Scanner innerScan = new Scanner(scan.nextLine());
while(innerScan.hasNext(WORD))
{
String word = innerScan.next(WORD);
if(!listOfWords.containsKey(word)){
listOfWords.put(word, 1);
}else{
int countWord = listOfWords.get(word) + 1;
//listOfWords.remove(word);
listOfWords.put(word, countWord);
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.