简体   繁体   中英

Most Efficient Way to Check File for List of Words

I just had a homework assignment that wanted me to add all the Java keywords to a HashSet. Then read in a .java file, and count how many times any keyword appeared in the .java file.

The route I took was: Created an String[] array that contained all the keywords. Created a HashSet, and used Collections.addAll to add the array to the HashSet. Then as I iterated through the text file I would check it by HashSet.contains(currentWordFromFile);

Someone recommended using a HashTable to do this. Then I seen a similar example using a TreeSet. I was just curious.. what's the recommended way to do this?

(Complete code here: http://pastebin.com/GdDmCWj0 )

Try a Map<String, Integer> where the String is the word and the Integer is the number of times the word has been seen.

One benefit of this is that you do not need to process the file twice.

You said "had a homework assignment" so I'm assuming you're done with this.

I would do it a bit differently. Firstly, I think some of the keywords in your String array were incorrect. According to Wikipedia and Oracle , Java has 50 keywords. Anyway, I've commented my code fairly well. Here's what I came up with...

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Map;
import java.util.HashMap;

public class CountKeywords {

    public static void main(String args[]) {

        String[] theKeywords = { "abstract", "assert", "boolean", "break", "byte", "case", "catch", "char", "class", "const", "continue", "default", "do", "double", "else", "enum", "extends", "false", "final", "finally", "float", "for", "goto", "if", "implements", "import", "instanceof", "int", "interface", "long", "native", "new", "null", "package", "private", "protected", "public", "return", "short", "static", "strictfp", "super", "switch", "synchronized", "this", "throw", "throws", "transient", "true", "try", "void", "volatile", "while" };

        // put each keyword in the map with value 0 
        Map<String, Integer> theKeywordCount = new HashMap<String, Integer>();
        for (String str : theKeywords) {
            theKeywordCount.put(str, 0);
        }

        FileReader fr;
        BufferedReader br;
        File file = new File(args[0]);

        // attempt to open and read file
        try {
            fr = new FileReader(file);
            br = new BufferedReader(fr);

            String sLine;

            // read lines until reaching the end of the file
            while ((sLine = br.readLine()) != null) {

                // if an empty line was read
                if (sLine.length() != 0) {

                    // extract the words from the current line in the file
                    if (theKeywordCount.containsKey(sLine)) {
                        theKeywordCount.put(sLine, theKeywordCount.get(sLine) + 1);
                    }
                }
            }

        } catch (FileNotFoundException exception) {
            // Unable to find file.
            exception.printStackTrace();
        } catch (IOException exception) {
            // Unable to read line.
            exception.printStackTrace();
        } finally {
                br.close();
            }

        // count how many times each keyword was encontered
        int occurrences = 0;
        for (Integer i : theKeywordCount.values()) {
            occurrences += i;
        }

        System.out.println("\n\nTotal occurences in file: " + occurrences);
    }
}

Every time I encounter a keyword from the file, I first check if its in the Map; if it isn't, its not a valid keyword; if it is, then I update the value the keyword is associated with, ie, I increment the associated Integer by 1 because we've seen this keyword once more.

Alternatively, you could get rid of that last for loop and just keep a running count, so you would instead have...

if (theKeywordCount.containsKey(sLine)) {
    occurrences++;
}

... and you print out the counter at the end.

I don't know if this is the most efficient way to do this, but I think its a solid start.

Let me know if you have any questions. I hope this helps.
Hristo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM