简体繁体 English

如何在Java + Akka上读取文件并发

[英]How to read file concurrency on Java + Akka

原文 2016-06-29 19:03:36 3 1 java/ lambda/ akka

I have a big data csv file with key:value rows. 我有一个包含key：value行的大数据csv文件。 How can i read it in parallel? 我如何并行阅读？ I can not divide it to chunk because any row has another byte-size. 我不能将其划分为块，因为任何行都有另一个字节大小。 What should i do in this case? 在这种情况下我该怎么办？

I can not find example on Java. 我找不到有关Java的示例。

1 个解决方案

There is actually no reason to read the same file paralell, because that does NOT increase the speed of the file reading. 实际上，没有理由读取同一文件并发文件，因为这不会提高文件读取的速度。 If you want to read a file you have several options to do it: 如果您想读取文件，则有几种选择：

You read the whole file to one byte[] at once, that's the fastest way of loading the file and after that you can split it to new lines and manage the data. 您一次将整个文件读取到一个字节[]，这是加载文件的最快方法，然后您可以将其拆分为新行并管理数据。
You read the lines from the file using a Scanner and the nextLine method. 您可以使用Scanner和nextLine方法从文件中读取行。 That's not really efficient, so I don't recommend this. 那不是很有效，所以我不建议这样做。
You read the file with some puffer byte array. 您使用一些缓冲字节数组读取文件。 That's a memory usage efficient solution, but the 1. option is still the best. 那是一种内存使用效率高的解决方案，但是1.选项仍然是最好的。

Also, because file loading is relatively slow (compared to data management in RAM) you should make a thread (yes, just only one, there is no need for more) which reads all the files to byte arrays, and maybe another thread which converts the byte[] to your loaded config, because that also can take lot of time if the file is big. 另外，由于文件加载相对较慢（与RAM中的数据管理相比），您应该创建一个线程（是，仅一个，就不需要更多），它将所有文件读取为字节数组，并且可能还有另一个线程将其转换为字节数组。要加载的配置中的byte []，因为如果文件很大，这也可能会花费很多时间。