简体   繁体   English

在多线程模式下读取多个文件

[英]Reading multiple files in multithreaded mode

I have ArrayList . 我有ArrayList It contains about 20,000 file path elements. 它包含大约20,000个文件路径元素。

private List<Path> listOfPaths = new ArrayList<>();

I want to read the contents of files on these paths in multithreaded mode. 我想以多线程模式读取这些路径上文件的内容。 The problem is that this code runs quite slowly. 问题在于此代码运行非常缓慢。 How can I select several threads so that each of them reads the file and writes it to the dto ? 如何选择几个线程,以便每个线程读取文件并将其写入dto How to solve the problem of one thread starting to process a file so that the other thread does not do the same with the same file? 如何解决一个线程开始处理文件的问题,以便另一个线程对同一文件不做同样的事情?

I created ioPool to not to block common-pool(which is used by default on parallel stream operations) with io operations. 我创建ioPool的目的是为了不阻止io操作使用common-pool(默认用于并行流操作)。 Normally it is advised if you are doing io operations you can create core-count* 2 threads, but it is really io limited as others mentioned. 通常建议您在执行io操作时,可以创建core-count* 2线程,但实际上它在io方面受到限制,就像其他人提到的那样。

You can do this like below. 您可以按照以下步骤进行操作。 This won't process your file list in order. 这不会按顺序处理您的文件列表。

 ForkJoinPool ioPool = new ForkJoinPool(8);
 ForkJoinTask<?> tasks = ioPool.submit(
              () -> pathList.parallelStream().forEach(//your code here);
 tasks.get(); // this blocks until all threads finishes in the pool

You can likely split the work in smaller chunks, each thread processing a portion of all the files. 您可以将工作分成较小的块,每个线程处理所有文件的一部分。 Each thread would have his own sub list of data to processed and list of processed data to avoid any risk of trying to read/write the same data at the same time. 每个线程都有自己要处理的数据的子列表和已处理的数据的列表,以避免尝试同时读取/写入相同数据的任何风险。 When all the thread have finished, you would colect the results. 当所有线程都完成后,您将确定结果。

Actually you can let java 8 parallel stream do the hard work of splitting/mergin etc for you. 实际上,您可以让Java 8并行流为您完成拆分/合并等艰巨的工作。

Using standard streams not using multiple threads: 使用标准流而不使用多个线程:

List<ParamsDTO> paramsList = listOfPaths.stream().map(p -> readFile(p)).collect(Collectors.toList());

Using parallel streams for improved performance: 使用并行流提高性能:

List<ParamsDTO> paramsList = listOfPaths.parallelStream().map(p -> readFile(p)).collect(Collectors.toList());

Where you have defined the function readFile as something like: 您将函数readFile定义为类似以下内容的位置:

public ParamDTO readFile(Path p) {
    ParamsDTO params = new ParamsDTO();
    params.setParams(Files.readAllBytes(path));
    return params;
}

You'll likely want to go beyond that in the long run, controlling the level of parallelism depending of the type of disk and to get more control, go with Java 5 executors for managing the thread pool characteristics and plain runable or futures for tasks to run. 从长远来看,您可能希望超越此限制,根据磁盘类型控制并行度,并获得更多控制权,请使用Java 5执行程序来管理线程池特征以及任务的普通可运行或将来执行。跑。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM