简体   繁体   English

Java - 并行处理文档

[英]Java - processing documents in parallel

I have 5 documents(say) and I have some processing on each of them. 我有5个文件(比如说),我对它们进行了一些处理。 Processing here includes open the document/file, read the data, do some document manipulation(edit text etc). 这里的处理包括打开文档/文件,读取数据,做一些文档操作(编辑文本等)。 For document manipulation I will probably be using docx4j or apache-poi. 对于文档操作,我可能会使用docx4j或apache-poi。 But my use case is this - I want to somehow process these 4-5 documents in parallel utilizing multiple cores available to me on my CPU. 但我的用例是这样的 - 我想以某种方式并行处理这些4-5文档,利用我的CPU上可用的多个内核。 The processing on each document is independent of each other. 每个文档的处理彼此独立。

What would be the best way to achieve this parallel processing in Java. 在Java中实现这种并行处理的最佳方法是什么。 I have used ExecutorService in java before and Thread class too. 我之前在java和Thread类中都使用过ExecutorService But I dont have much idea about the newer concepts like Streams or RxJava . 但我对StreamsRxJava等新概念RxJava Can this task be achieved by using Parallel Stream in Java as introduced in Java 8? 是否可以通过Java 8中引入的Java中的Parallel Stream实现此任务? What would be better to use Executors/Streams/Thread Class etc. If Streams can be used please provide a link where I can find some tutorial on how to do that. 使用Executors / Streams / Thread Class等会更好。如果可以使用Streams,请提供一个链接,我可以在其中找到有关如何执行此操作的教程。 Thanks for your help! 谢谢你的帮助!

You can process in parallel using Java Streams using the following pattern. 您可以使用Java Streams并行处理,使用以下模式。

List<File> files = ...
files.parallelStream().forEach(f -> process(f));

or 要么

File[] files = dir.listFiles();
Stream.of(files).parallel().forEach(f -> process(f));

Note: process cannot throw a CheckedException in this example. 注意:在此示例中, process不能抛出CheckedException。 I suggest you either log it or return a result object. 我建议你记录它或返回一个结果对象。

If you want to learn about ReactiveX, I would recomend use rxJava Observable.zip http://reactivex.io/documentation/operators/zip.html 如果你想了解ReactiveX,我会建议使用rxJava Observable.zip http://reactivex.io/documentation/operators/zip.html

Where you can run multiple process on parallel here an example: 你可以在这里并行运行多个进程的例子:

 public class ObservableZip {

  private Scheduler scheduler;
  private Scheduler scheduler1;
  private Scheduler scheduler2;

  @Test
  public void testAsyncZip() {
           scheduler = Schedulers.newThread();//Thread to open and read 1 file
           scheduler1 = Schedulers.newThread();//Thread to open and read 1 file
           scheduler2 = Schedulers.newThread();//Thread to open and read 1 file
           Observable.zip(obAsyncString(file1), obAsyncString1(file2), obAsyncString2(file3), (s, s2, s3) -> s.concat(s2)
                                                                                        .concat(s3))
              .subscribe(result -> showResult("All files in one:", result));
       }

       public void showResult(String transactionType, String result) {
           System.out.println(result + " " +
                               transactionType);
       }

       public Observable<String> obAsyncString(File file) {
           return Observable.just(file)
                     .observeOn(scheduler)
                     .doOnNext(val -> {
                        //Here you  read your file
                     });
       }

       public Observable<String> obAsyncString1(File file) {
           return Observable.just(file)
                     .observeOn(scheduler1)
                     .doOnNext(val -> {
                         //Here you  read your file 2

                     });
       }

       public Observable<String> obAsyncString2(File file) {
           return Observable.just(file)
                     .observeOn(scheduler2)
                     .doOnNext(val -> {
                         //Here you  read your file 3

                     });
       }
      }

Like I said, just in case that you want to learn about ReactiveX, because if it not, add this framework in your stack to solve the issue would be a little overkill, and I would much rather the previous stream parallel solution 就像我说的,以防你想要了解ReactiveX,因为如果不是这样,在你的堆栈中添加这个框架来解决这个问题会有点矫枉过正,我宁愿以前的流并行解决方案

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM