简体   繁体   中英

Store occurences of words in a file and their count,using Scanner.( Java )

Here's the code:

        Scanner scan = new Scanner(new FileReader ("C:\\mytext.txt"));
        HashMap<String, Integer> listOfWords = new HashMap<String, Integer>();

        while(scan.hasNextLine())
        {
            Scanner innerScan = new Scanner(scan.nextLine());
            boolean wordExistence ;
            while(wordExistence = innerScan.hasNext())
            {
                String word = innerScan.next(); 
                int countWord = 0;
                if(!listOfWords.containsKey(word)){ already
                    listOfWords.put(word, 1); 
                }else{
                    countWord = listOfWords.get(word) + 1; 
                    listOfWords.remove(word);
                    listOfWords.put(word, countWord); 
                }
            }
        }

        System.out.println(listOfWords.toString());

The problem is, my output contains words like :

document.Because=1 document.This=1 space.=1

How do I handle this full stop's that are occuring?(And for further issues, I think any sentence terminator would be an issue, like question mark or exclamation mark).

查看Scanner API的类说明,特别是有关使用除空格之外的定界符的段落。

Scanner uses any whitespace as the default delimiter. You can call useDelimiter() of the Scanner instance and specify your own regexp to be used as delimiter.

If you want your input to be split not only using white space delimiter, but also . and question/exclamation mark, you will have to define a Pattern and then apply it to your Scanner using useDelimiter ( doc ).

Maybe you want to tinker with the following answer for speed optimization.

    final Pattern WORD = Pattern.compile("\\w+");
    while(scan.hasNextLine())
    {
        Scanner innerScan = new Scanner(scan.nextLine());
        while(innerScan.hasNext(WORD))
        {
            String word = innerScan.next(WORD); 
            if(!listOfWords.containsKey(word)){
                listOfWords.put(word, 1); 
            }else{
                int countWord = listOfWords.get(word) + 1; 
                //listOfWords.remove(word);
                listOfWords.put(word, countWord); 
            }
        }
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM