[英]Java spark Enumeration in parallel
我在 Java Spark 中運行以下代碼:
ZipFile zipFile = new ZipFile(zipFilePath);
Enumeration<? extends ZipEnter> entries = zipFiles.entries();
while(entries.hasMoreElements()) {
ZipEntry entry = entries.nextElement();
//my logic...
}
我想與 Spark 或 Java 並行並行執行上面的代碼,我該怎么做?
謝謝
下面的代碼將分別在 java 和 scala 中為枚舉中的每個條目同時處理邏輯。
在 Java 中
entriesList = Collections.list(enumeration);
List<CompletableFuture<ZipEnter>> futureList = entriesList.stream().(x -> CompletableFuture. supplyAsync(() -> {
//logic
}).collect(Collectors.toList());
CompletableFuture.allof(futureList);
在斯卡拉
entriesList = // to scala list
Future[ZipEnter] futureList = entriesList.map(x => Future{
// logic
})
Future.sequence(futureList)
希望能幫助到你。
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.VoidFunction;
import java.io.BufferedInputStream;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.util.Arrays;
import java.util.List;
import java.util.Objects;
public class ParallelEnumeration {
public static void main(String[] args) {
String zipFilePath = "/ZipDir/";
File zipFiles = new File(zipFilePath);
final List<File> files = Arrays.asList(Objects.requireNonNull(zipFiles.listFiles()));
// configure spark
SparkConf sparkConf = new SparkConf().setAppName("Print Elements of RDD")
.setMaster("local[*]");
// start a spark context
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
// parallelize the file collection to two partitions
jsc.parallelize(files, 2)
.filter(file -> { // This filter is optional if the directory contains only zip files
// https://stackoverflow.com/questions/33934178/how-to-identify-a-zip-file-in-java
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
int test = in.readInt();
in.close();
return test == 0x504b0304;
}).foreach((VoidFunction<File>) file -> System.out.println(file.getName()));
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.