简体   繁体   English

如何在Java + Akka上读取文件并发

[英]How to read file concurrency on Java + Akka

I have a big data csv file with key:value rows. 我有一个包含key:value行的大数据csv文件。 How can i read it in parallel? 我如何并行阅读? I can not divide it to chunk because any row has another byte-size. 我不能将其划分为块,因为任何行都有另一个字节大小。 What should i do in this case? 在这种情况下我该怎么办?

I can not find example on Java. 我找不到有关Java的示例。

There is actually no reason to read the same file paralell, because that does NOT increase the speed of the file reading. 实际上,没有理由读取同一文件并发文件,因为这不会提高文件读取的速度。 If you want to read a file you have several options to do it: 如果您想读取文件,则有几种选择:

  1. You read the whole file to one byte[] at once, that's the fastest way of loading the file and after that you can split it to new lines and manage the data. 您一次将整个文件读取到一个字节[],这是加载文件的最快方法,然后您可以将其拆分为新行并管理数据。

  2. You read the lines from the file using a Scanner and the nextLine method. 您可以使用Scanner和nextLine方法从文件中读取行。 That's not really efficient, so I don't recommend this. 那不是很有效,所以我不建议这样做。

  3. You read the file with some puffer byte array. 您使用一些缓冲字节数组读取文件。 That's a memory usage efficient solution, but the 1. option is still the best. 那是一种内存使用效率高的解决方案,但是1.选项仍然是最好的。

Also, because file loading is relatively slow (compared to data management in RAM) you should make a thread (yes, just only one, there is no need for more) which reads all the files to byte arrays, and maybe another thread which converts the byte[] to your loaded config, because that also can take lot of time if the file is big. 另外,由于文件加载相对较慢(与RAM中的数据管理相比),您应该创建一个线程(是,仅一个,就不需要更多),它将所有文件读取为字节数组,并且可能还有另一个线程将其转换为字节数组。要加载的配置中的byte [],因为如果文件很大,这也可能会花费很多时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM