用Java计算100K单词的频率应花费多少时间

Question

I have to read through a text file with roughly 100K words and create a HashMap with the frequency of each word. 我必须阅读大约10万个单词的文本文件，并使用每个单词的频率创建一个HashMap。 The code I have so far takes about 15-20 minutes to execute and I'm guessing I'm doing something horribly wrong. 到目前为止，我执行的代码大约需要15到20分钟才能执行，我猜我在做一些可怕的错误。 How much would the execution time for such task be? 这样的任务的执行时间是多少？

This is the code I'm using 这是我正在使用的代码

    Scanner scanner = new Scanner(new FileReader("myFile.txt"));
    HashMap<String, Integer> wordFrequencies = new HashMap<>();
    while (scanner.hasNextLine()) {
        wordFrequencies.merge(scanner.next(), 1, (a, b) -> a + b);
    }
    return wordFrequencies;

Answer 1

It should take next-to-no-time. 这应该是无时间的。 As in, if you're doing this just once, you should barely notice the time it takes. 例如，如果只执行一次，则几乎不会注意到它所花费的时间。 If it's taking 20 minutes, you're processing roughly 100 words per second, which is abysmal performance, even if your words are really long. 如果要花费20分钟，那么您每秒将处理大约100个单词，即使您的单词确实很长，这也是令人讨厌的性能。

From the Javadoc of BufferedReader (emphasis added): 从BufferedReader的Javadoc中（添加了重点）：

In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. 通常，由读取器发出的每个读取请求都会导致对基础字符或字节流进行相应的读取请求。 It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders. 因此，建议将BufferedReader包装在其read（）操作可能会很昂贵的任何Reader 周围，例如FileReaders和InputStreamReaders。

Try wrapping the FileReader in a BufferedReader : 尝试将FileReader包装在BufferedReader ：

Scanner scanner = new Scanner(new BufferedReader(new FileReader("myFile.txt")));

用Java计算100K单词的频率应花费多少时间

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-11-04 18:52:18

用Java计算100K单词的频率应花费多少时间

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-11-04 18:52:18

解决方案1
2 已采纳 2017-11-04 18:52:18