简体   繁体   中英

Spark/Java serializable issue - org.apache.spark.SparkException: Task not serializable

I am having an issue with the following bits of code when writing an application for Spark using Java:

public class BatchLayerDefaultJob implements Serializable {

private static Function <BatchLayerProcessor, Future> batchFunction = new Function<BatchLayerProcessor, Future>() {
    @Override
    public Future call(BatchLayerProcessor s) {
        return executor.submit(s);
    }
};
public void applicationRunner(BatchParameters batchParameters) {


 SparkConf sparkConf = new SparkConf().setAppName("Platform Engine - Batch Job");
 sparkConf.set("spark.driver.allowMultipleContexts", "true");
 sparkConf.set("spark.cores.max", "1");
 JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
 List<BatchLayerProcessor> batchListforRDD = new ArrayList<BatchLayerProcessor>();

// populate List here.... Then attempt to process below

JavaRDD<BatchLayerProcessor> distData = sparkContext.parallelize(batchListforRDD, batchListforRDD.size());
JavaRDD<Future> result = distData.map(batchFunction);
result.collect(); // <-- Produces an object not serializable exception here 

So I have tried a number of things to no avail including extracting the batchFunction as a separate class outside of the influence of the main class and I have also attempted to use mapPartitions instead of map. I am more or less out of ideas. Any help is appreciated.

Stack trace below:

17/11/30 17:11:28 INFO DAGScheduler: Job 0 failed: collect at 
BatchLayerDefaultJob.java:122, took 23.406561 s
Exception in thread "Thread-8" org.apache.spark.SparkException: Job aborted due to stage failure: Failed to serialize task 0, not attempting to retry it. Exception during serialization: 
java.io.NotSerializableException: xxxx.BatchLayerProcessor
Serialization stack:
- object not serializable (class: xxxx.BatchLayerProcessor, value: xxxx.BatchLayerProcessor@3e745097)
- element of array (index: 0)
- array (class [Ljava.lang.Object;, size 1)
- field (class: scala.collection.mutable.WrappedArray$ofRef, name: array, type: class [Ljava.lang.Object;)
- object (class scala.collection.mutable.WrappedArray$ofRef, WrappedArray(xxxx.BatchLayerProcessor@3e745097))
- writeObject data (class: org.apache.spark.rdd.ParallelCollectionPartition)
- object (class org.apache.spark.rdd.ParallelCollectionPartition, org.apache.spark.rdd.ParallelCollectionPartition@691)
- field (class: org.apache.spark.scheduler.ResultTask, name: partition, type: interface org.apache.spark.Partition)
- object (class org.apache.spark.scheduler.ResultTask, ResultTask(0, 0))
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)

Cheers.

EDIT::Added BatchLayerProcessor as requested - Slightly truncated:

public class BatchLayerProcessor implements Runnable, Serializable {
private int interval, backMinutes;
private String scoreVal, batchjobid;
private static CountDownLatch countDownLatch;
 public void run() {
    /* Get a reference to the ApplicationContextReader, a singleton*/
    ApplicationContextReader applicationContextReaderCopy = ApplicationContextReader.getInstance();

    synchronized (BatchLayerProcessor.class) /* Protect singleton member variable from multithreaded access. */ {
        if (applicationContextReader == null) /* If local reference is null...*/
            applicationContextReader = applicationContextReaderCopy; /* ...set it to the singleton */
    }

    if (getxScoreVal().equals("")) {
        applicationContextReader.getScoreService().calculateScores(applicationContextReader.getFunctions(), getInterval(), getBackMinutes(), getScoreVal(), true, getTimeInterval(), getIncludes(), getExcludes());
    }
    else {
        applicationContextReader.getScoreService().calculateScores(applicationContextReader.getFunctions(), getInterval(), getBackMinutes(), getScoreVal(), true, getTimeInterval(), getIncludes(), getExcludes());
    }

    countDownLatch.countDown();
}

决定更改BatchLayerProcessor,使其不可运行,而是依靠Spark在那里为我完成工作。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM