[英]How much time should take to count the frequency of 100K words in Java
I have to read through a text file with roughly 100K words and create a HashMap with the frequency of each word. 我必须阅读大约10万个单词的文本文件,并使用每个单词的频率创建一个HashMap。 The code I have so far takes about 15-20 minutes to execute and I'm guessing I'm doing something horribly wrong.
到目前为止,我执行的代码大约需要15到20分钟才能执行,我猜我在做一些可怕的错误。 How much would the execution time for such task be?
这样的任务的执行时间是多少?
This is the code I'm using 这是我正在使用的代码
Scanner scanner = new Scanner(new FileReader("myFile.txt"));
HashMap<String, Integer> wordFrequencies = new HashMap<>();
while (scanner.hasNextLine()) {
wordFrequencies.merge(scanner.next(), 1, (a, b) -> a + b);
}
return wordFrequencies;
It should take next-to-no-time. 这应该是无时间的。 As in, if you're doing this just once, you should barely notice the time it takes.
例如,如果只执行一次,则几乎不会注意到它所花费的时间。 If it's taking 20 minutes, you're processing roughly 100 words per second, which is abysmal performance, even if your words are really long.
如果要花费20分钟,那么您每秒将处理大约100个单词,即使您的单词确实很长,这也是令人讨厌的性能。
From the Javadoc of BufferedReader
(emphasis added): 从
BufferedReader
的Javadoc中(添加了重点):
In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream.
通常,由读取器发出的每个读取请求都会导致对基础字符或字节流进行相应的读取请求。 It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders.
因此,建议将BufferedReader包装在其read()操作可能会很昂贵的任何Reader 周围 ,例如FileReaders和InputStreamReaders。
Try wrapping the FileReader
in a BufferedReader
: 尝试将
FileReader
包装在BufferedReader
:
Scanner scanner = new Scanner(new BufferedReader(new FileReader("myFile.txt")));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.