简体   繁体   English

用Java计算100K单词的频率应花费多少时间

[英]How much time should take to count the frequency of 100K words in Java

I have to read through a text file with roughly 100K words and create a HashMap with the frequency of each word. 我必须阅读大约10万个单词的文本文件,并使用每个单词的频率创建一个HashMap。 The code I have so far takes about 15-20 minutes to execute and I'm guessing I'm doing something horribly wrong. 到目前为止,我执行的代码大约需要15到20分钟才能执行,我猜我在做一些可怕的错误。 How much would the execution time for such task be? 这样的任务的执行时间是多少?

This is the code I'm using 这是我正在使用的代码

    Scanner scanner = new Scanner(new FileReader("myFile.txt"));
    HashMap<String, Integer> wordFrequencies = new HashMap<>();
    while (scanner.hasNextLine()) {
        wordFrequencies.merge(scanner.next(), 1, (a, b) -> a + b);
    }
    return wordFrequencies;

It should take next-to-no-time. 这应该是无时间的。 As in, if you're doing this just once, you should barely notice the time it takes. 例如,如果只执行一次,则几乎不会注意到它所花费的时间。 If it's taking 20 minutes, you're processing roughly 100 words per second, which is abysmal performance, even if your words are really long. 如果要花费20分钟,那么您每秒将处理大约100个单词,即使您的单词确实很长,这也是令人讨厌的性能。

From the Javadoc of BufferedReader (emphasis added): BufferedReader的Javadoc中(添加了重点):

In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. 通常,由读取器发出的每个读取请求都会导致对基础字符或字节流进行相应的读取请求。 It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders. 因此,建议将BufferedReader包装在其read()操作可能会很昂贵的任何Reader 周围 ,例如FileReaders和InputStreamReaders。

Try wrapping the FileReader in a BufferedReader : 尝试将FileReader包装在BufferedReader

Scanner scanner = new Scanner(new BufferedReader(new FileReader("myFile.txt")));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 有没有办法知道 Pushy(Java APNs 库)发送 100K 推送通知需要多长时间? - Is there any way to know how much time Pushy(Java APNs library) will take to send 100K push notification? 用Java表示100K X 100K矩阵 - Representing a 100K X 100K matrix in Java NTFS目录有100K条目。 如果分布在100个子目录上,性能会提升多少? - NTFS directory has 100K entries. How much performance boost if spread over 100 subdirectories? 如果要编码10万个文档,则将PDF文件编码为base64将花费更多时间 - PDF file encode to base64 take more time if 100k documents are to be encode 如何使用java将超过10万行的excel文件导入MySQL数据库 - How to import excel file with over 100k rows into MySQL database using java 具有100k线程的基于Java的服务器能否生存 - Can Java based server with 100k thread survive 比较java中的两个大列表(超过100k) - Comparing two big lists (more than 100k) in java 我应该使用哪个集合来检查值是否在100K元素的集合中? - Which collection should I use to check if a value is in the collection of 100K elements? 用 where 子句计算大 MySQL 表上的最后一条记录 - Count with where clause only 100k last records on big MySQL table 如何为多个(10k - 100k)请求正确调用Akka HTTP客户端? - How to properly call Akka HTTP client for multiple (10k - 100k) requests?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM