简体   繁体   中英

How much time should take to count the frequency of 100K words in Java

I have to read through a text file with roughly 100K words and create a HashMap with the frequency of each word. The code I have so far takes about 15-20 minutes to execute and I'm guessing I'm doing something horribly wrong. How much would the execution time for such task be?

This is the code I'm using

    Scanner scanner = new Scanner(new FileReader("myFile.txt"));
    HashMap<String, Integer> wordFrequencies = new HashMap<>();
    while (scanner.hasNextLine()) {
        wordFrequencies.merge(scanner.next(), 1, (a, b) -> a + b);
    }
    return wordFrequencies;

It should take next-to-no-time. As in, if you're doing this just once, you should barely notice the time it takes. If it's taking 20 minutes, you're processing roughly 100 words per second, which is abysmal performance, even if your words are really long.

From the Javadoc of BufferedReader (emphasis added):

In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders.

Try wrapping the FileReader in a BufferedReader :

Scanner scanner = new Scanner(new BufferedReader(new FileReader("myFile.txt")));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM