检查文件中单词列表的最有效方法

Question

I just had a homework assignment that wanted me to add all the Java keywords to a HashSet. 我刚完成一项作业，希望我将所有Java关键字添加到HashSet中。 Then read in a .java file, and count how many times any keyword appeared in the .java file. 然后读入一个.java文件，并计算任何关键字出现在.java文件中的次数。

The route I took was: Created an String[] array that contained all the keywords. 我采取的方法是：创建一个包含所有关键字的String []数组。 Created a HashSet, and used Collections.addAll to add the array to the HashSet. 创建一个HashSet，并使用Collections.addAll将数组添加到HashSet中。 Then as I iterated through the text file I would check it by HashSet.contains(currentWordFromFile); 然后，当我遍历文本文件时，将通过HashSet.contains（currentWordFromFile）;对其进行检查。

Someone recommended using a HashTable to do this. 有人建议使用HashTable执行此操作。 Then I seen a similar example using a TreeSet. 然后我看到了一个使用TreeSet的类似示例。 I was just curious.. what's the recommended way to do this? 我只是好奇..推荐这样做的方法是什么？

(Complete code here: http://pastebin.com/GdDmCWj0 ) （在此处完成代码： http : //pastebin.com/GdDmCWj0 ）

Answer 1

Try a Map<String, Integer> where the String is the word and the Integer is the number of times the word has been seen. 尝试使用Map<String, Integer> ，其中String是单词，而Integer是出现该单词的次数。

One benefit of this is that you do not need to process the file twice. 这样的好处之一是您不需要处理文件两次。

Answer 2

You said "had a homework assignment" so I'm assuming you're done with this. 您说“有家庭作业”，所以我假设您已经完成了。

I would do it a bit differently. 我会做一些不同的事情。 Firstly, I think some of the keywords in your String array were incorrect. 首先，我认为您的String数组中的某些关键字不正确。 According to Wikipedia and Oracle , Java has 50 keywords. 根据Wikipedia和Oracle的说法，Java有50个关键字。 Anyway, I've commented my code fairly well. 无论如何，我已经很好地注释了我的代码。 Here's what I came up with... 这是我想出的...

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Map;
import java.util.HashMap;

public class CountKeywords {

    public static void main(String args[]) {

        String[] theKeywords = { "abstract", "assert", "boolean", "break", "byte", "case", "catch", "char", "class", "const", "continue", "default", "do", "double", "else", "enum", "extends", "false", "final", "finally", "float", "for", "goto", "if", "implements", "import", "instanceof", "int", "interface", "long", "native", "new", "null", "package", "private", "protected", "public", "return", "short", "static", "strictfp", "super", "switch", "synchronized", "this", "throw", "throws", "transient", "true", "try", "void", "volatile", "while" };

        // put each keyword in the map with value 0 
        Map<String, Integer> theKeywordCount = new HashMap<String, Integer>();
        for (String str : theKeywords) {
            theKeywordCount.put(str, 0);
        }

        FileReader fr;
        BufferedReader br;
        File file = new File(args[0]);

        // attempt to open and read file
        try {
            fr = new FileReader(file);
            br = new BufferedReader(fr);

            String sLine;

            // read lines until reaching the end of the file
            while ((sLine = br.readLine()) != null) {

                // if an empty line was read
                if (sLine.length() != 0) {

                    // extract the words from the current line in the file
                    if (theKeywordCount.containsKey(sLine)) {
                        theKeywordCount.put(sLine, theKeywordCount.get(sLine) + 1);
                    }
                }
            }

        } catch (FileNotFoundException exception) {
            // Unable to find file.
            exception.printStackTrace();
        } catch (IOException exception) {
            // Unable to read line.
            exception.printStackTrace();
        } finally {
                br.close();
            }

        // count how many times each keyword was encontered
        int occurrences = 0;
        for (Integer i : theKeywordCount.values()) {
            occurrences += i;
        }

        System.out.println("\n\nTotal occurences in file: " + occurrences);
    }
}

Every time I encounter a keyword from the file, I first check if its in the Map; 每次遇到文件中的关键字时，我都会先检查它是否在Map中； if it isn't, its not a valid keyword; 如果不是，则它不是有效的关键字； if it is, then I update the value the keyword is associated with, ie, I increment the associated Integer by 1 because we've seen this keyword once more. 如果是，那么我将更新与关键字关联的值，即，将关联的Integer递增1，因为我们再次看到了此关键字。

Alternatively, you could get rid of that last for loop and just keep a running count, so you would instead have... 或者，您可以摆脱最后一个for循环，而只需保持运行计数，那么您将拥有...

if (theKeywordCount.containsKey(sLine)) {
    occurrences++;
}

... and you print out the counter at the end. ...，然后在最后打印出计数器。

I don't know if this is the most efficient way to do this, but I think its a solid start. 我不知道这是否是最有效的方法，但我认为这是一个坚实的开端。

Let me know if you have any questions. 如果您有任何疑问，请告诉我。 I hope this helps. 我希望这有帮助。
Hristo 斯托伊奇

检查文件中单词列表的最有效方法

问题描述

2 个解决方案

解决方案1
2 已采纳 2011-04-27 05:22:15

解决方案2
1 2011-04-27 06:12:47

检查文件中单词列表的最有效方法

问题描述

2 个解决方案

解决方案1 2 已采纳 2011-04-27 05:22:15

解决方案2 1 2011-04-27 06:12:47

解决方案1
2 已采纳 2011-04-27 05:22:15

解决方案2
1 2011-04-27 06:12:47