Here's the code:
Scanner scan = new Scanner(new FileReader ("C:\\mytext.txt"));
HashMap<String, Integer> listOfWords = new HashMap<String, Integer>();
while(scan.hasNextLine())
{
Scanner innerScan = new Scanner(scan.nextLine());
boolean wordExistence ;
while(wordExistence = innerScan.hasNext())
{
String word = innerScan.next();
int countWord = 0;
if(!listOfWords.containsKey(word)){ already
listOfWords.put(word, 1);
}else{
countWord = listOfWords.get(word) + 1;
listOfWords.remove(word);
listOfWords.put(word, countWord);
}
}
}
System.out.println(listOfWords.toString());
The problem is, my output contains words like :
document.Because=1
document.This=1
space.=1
How do I handle this full stop's that are occuring?(And for further issues, I think any sentence terminator would be an issue, like question mark or exclamation mark).
查看Scanner API
的类说明,特别是有关使用除空格之外的定界符的段落。
Scanner
uses any whitespace as the default delimiter. You can call useDelimiter()
of the Scanner instance and specify your own regexp to be used as delimiter.
Maybe you want to tinker with the following answer for speed optimization.
final Pattern WORD = Pattern.compile("\\w+");
while(scan.hasNextLine())
{
Scanner innerScan = new Scanner(scan.nextLine());
while(innerScan.hasNext(WORD))
{
String word = innerScan.next(WORD);
if(!listOfWords.containsKey(word)){
listOfWords.put(word, 1);
}else{
int countWord = listOfWords.get(word) + 1;
//listOfWords.remove(word);
listOfWords.put(word, countWord);
}
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.