简体   繁体   English

使用扫描仪将单词的出现次数及其计数存储在文件中。(Java)

[英]Store occurences of words in a file and their count,using Scanner.( Java )

Here's the code: 这是代码:

        Scanner scan = new Scanner(new FileReader ("C:\\mytext.txt"));
        HashMap<String, Integer> listOfWords = new HashMap<String, Integer>();

        while(scan.hasNextLine())
        {
            Scanner innerScan = new Scanner(scan.nextLine());
            boolean wordExistence ;
            while(wordExistence = innerScan.hasNext())
            {
                String word = innerScan.next(); 
                int countWord = 0;
                if(!listOfWords.containsKey(word)){ already
                    listOfWords.put(word, 1); 
                }else{
                    countWord = listOfWords.get(word) + 1; 
                    listOfWords.remove(word);
                    listOfWords.put(word, countWord); 
                }
            }
        }

        System.out.println(listOfWords.toString());

The problem is, my output contains words like : 问题是,我的输出包含像这样的词:

document.Because=1 document.This=1 space.=1 document.Because=1 document.This=1 space.=1

How do I handle this full stop's that are occuring?(And for further issues, I think any sentence terminator would be an issue, like question mark or exclamation mark). 我该如何处理正在发生的句号?(对于其他问题,我认为任何句子终止符都会成为问题,例如问号或感叹号)。

查看Scanner API的类说明,特别是有关使用除空格之外的定界符的段落。

Scanner uses any whitespace as the default delimiter. Scanner使用任何空格作为默认定界符。 You can call useDelimiter() of the Scanner instance and specify your own regexp to be used as delimiter. 您可以调用Scanner实例的useDelimiter()并指定您自己的正则表达式用作定界符。

If you want your input to be split not only using white space delimiter, but also . 如果您希望不仅使用空格分隔符来分割输入,还可以使用. and question/exclamation mark, you will have to define a Pattern and then apply it to your Scanner using useDelimiter ( doc ). 和问号/感叹号,您将必须定义一个Pattern ,然后使用useDelimiterdoc )将其应用于您的扫描仪。

Maybe you want to tinker with the following answer for speed optimization. 也许您想修改以下答案以优化速度。

    final Pattern WORD = Pattern.compile("\\w+");
    while(scan.hasNextLine())
    {
        Scanner innerScan = new Scanner(scan.nextLine());
        while(innerScan.hasNext(WORD))
        {
            String word = innerScan.next(WORD); 
            if(!listOfWords.containsKey(word)){
                listOfWords.put(word, 1); 
            }else{
                int countWord = listOfWords.get(word) + 1; 
                //listOfWords.remove(word);
                listOfWords.put(word, countWord); 
            }
        }
    }

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用扫描仪分析文件中的输入。 JAVA - Parsing input from a file using Scanner. JAVA 使用 Scanner 的 java.io.FileNotFoundException(未找到文件)。 我的代码有什么问题? - java.io.FileNotFoundException (File not found) using Scanner. What's wrong in my code? 您如何以与使用扫描仪阅读 .txt 文件相同的方式阅读 .PDF 文件。 这是在java for android - How do you read a .PDF file the same way you read .txt file using scanner. This is in java for android 使用扫描仪Java仅计数文件中的单词 - Counting for only words in a file using a scanner Java Java - 使用扫描仪读取文本文件并存储它,给我空的 ArrayList。 使用分隔符拆分单词 - Java - Read text file and store it using scanner, gives me empty ArrayList. Split words by using Delimeter 使用Eclipse进行Java获取无法解决扫描程序。 有人知道我做错了吗? - Using Eclipse for java getting cannot be resolved for scanner. Anyone know what I have done wrong? 使用扫描仪时逻辑错误。 文本文件的第一行未打印到标准输出 - Wrong logic in using scanner. The first line of the text file is not printed out to stdout 使用扫描仪扫描仪(Java)计算字数和行数 - Counting words and lines using scanner Scanner (Java) 从文本文件/扫描仪存储字符串。 数组还是内置的? - Storing strings from a text file/scanner. Array or built-in? 使用Java中的扫描器读取单词 - Read words using scanner in Java
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM