[英]Java - processing documents in parallel
I have 5 documents(say) and I have some processing on each of them. 我有5个文件(比如说),我对它们进行了一些处理。 Processing here includes open the document/file, read the data, do some document manipulation(edit text etc).
这里的处理包括打开文档/文件,读取数据,做一些文档操作(编辑文本等)。 For document manipulation I will probably be using docx4j or apache-poi.
对于文档操作,我可能会使用docx4j或apache-poi。 But my use case is this - I want to somehow process these 4-5 documents in parallel utilizing multiple cores available to me on my CPU.
但我的用例是这样的 - 我想以某种方式并行处理这些4-5文档,利用我的CPU上可用的多个内核。 The processing on each document is independent of each other.
每个文档的处理彼此独立。
What would be the best way to achieve this parallel processing in Java. 在Java中实现这种并行处理的最佳方法是什么。 I have used
ExecutorService
in java before and Thread
class too. 我之前在java和
Thread
类中都使用过ExecutorService
。 But I dont have much idea about the newer concepts like Streams
or RxJava
. 但我对
Streams
或RxJava
等新概念RxJava
。 Can this task be achieved by using Parallel Stream in Java as introduced in Java 8? 是否可以通过Java 8中引入的Java中的Parallel Stream实现此任务? What would be better to use Executors/Streams/Thread Class etc. If Streams can be used please provide a link where I can find some tutorial on how to do that.
使用Executors / Streams / Thread Class等会更好。如果可以使用Streams,请提供一个链接,我可以在其中找到有关如何执行此操作的教程。 Thanks for your help!
谢谢你的帮助!
You can process in parallel using Java Streams using the following pattern. 您可以使用Java Streams并行处理,使用以下模式。
List<File> files = ...
files.parallelStream().forEach(f -> process(f));
or 要么
File[] files = dir.listFiles();
Stream.of(files).parallel().forEach(f -> process(f));
Note: process
cannot throw a CheckedException in this example. 注意:在此示例中,
process
不能抛出CheckedException。 I suggest you either log it or return a result object. 我建议你记录它或返回一个结果对象。
If you want to learn about ReactiveX, I would recomend use rxJava Observable.zip http://reactivex.io/documentation/operators/zip.html 如果你想了解ReactiveX,我会建议使用rxJava Observable.zip http://reactivex.io/documentation/operators/zip.html
Where you can run multiple process on parallel here an example: 你可以在这里并行运行多个进程的例子:
public class ObservableZip {
private Scheduler scheduler;
private Scheduler scheduler1;
private Scheduler scheduler2;
@Test
public void testAsyncZip() {
scheduler = Schedulers.newThread();//Thread to open and read 1 file
scheduler1 = Schedulers.newThread();//Thread to open and read 1 file
scheduler2 = Schedulers.newThread();//Thread to open and read 1 file
Observable.zip(obAsyncString(file1), obAsyncString1(file2), obAsyncString2(file3), (s, s2, s3) -> s.concat(s2)
.concat(s3))
.subscribe(result -> showResult("All files in one:", result));
}
public void showResult(String transactionType, String result) {
System.out.println(result + " " +
transactionType);
}
public Observable<String> obAsyncString(File file) {
return Observable.just(file)
.observeOn(scheduler)
.doOnNext(val -> {
//Here you read your file
});
}
public Observable<String> obAsyncString1(File file) {
return Observable.just(file)
.observeOn(scheduler1)
.doOnNext(val -> {
//Here you read your file 2
});
}
public Observable<String> obAsyncString2(File file) {
return Observable.just(file)
.observeOn(scheduler2)
.doOnNext(val -> {
//Here you read your file 3
});
}
}
Like I said, just in case that you want to learn about ReactiveX, because if it not, add this framework in your stack to solve the issue would be a little overkill, and I would much rather the previous stream parallel solution 就像我说的,以防你想要了解ReactiveX,因为如果不是这样,在你的堆栈中添加这个框架来解决这个问题会有点矫枉过正,我宁愿以前的流并行解决方案
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.