简体   繁体   中英

How to throw exception in spark streaming

We have a spark streaming program which pull messages from kafka and process each individual message using forEachPartiton transformation.

If case if there is specific error in the processing function we would like to throw the exception back and halt the program. The same seems to be not happening. Below is the code we are trying to execute.

JavaInputDStream<KafkaDTO> stream = KafkaUtils.createDirectStream( ...);

stream.foreachRDD(new Function<JavaRDD<KafkaDTO>, Void>() {

    public Void call(JavaRDD<KafkaDTO> rdd) throws PropertiesLoadException, Exception {    
         rdd.foreachPartition(new VoidFunction<Iterator<KafkaDTO>>() {

             @Override
             public void call(Iterator<KafkaDTO> itr) throws PropertiesLoadException, Exception {
                 while (itr.hasNext()) {
                     KafkaDTO dto = itr.next();
                     try{
                       //process the message here.
                     } catch (PropertiesLoadException e) {
                         // throw Exception if property file is not found
                         throw new PropertiesLoadException(" PropertiesLoadException: "+e.getMessage());
                     } catch (Exception e) {
                         throw new Exception(" Exception : "+e.getMessage());
                     }
                 }
             }
         });
     }
 }

In the above code even if we throw a PropertiesLoadException the program doesn't halt and streaming continues. The max retries we set in Spark configuration is only 4. The streaming program continues even after 4 failures. How should the exception be thrown to stop the program?

I am not sure if this is the best approach but we surrounded the main batch with try and catch and when I get exception I just call close context. In addition you need to make sure that stop gracfully is off (false).

Example code:

try {
    process(dataframe);
} catch (Exception e) {
    logger.error("Failed on write - will stop spark context immediately!!" + e.getMessage());
    closeContext(jssc);
    if (e instanceof InterruptedException) {
        Thread.currentThread().interrupt();
    }
    throw e;
}

And close function:

private void closeContext(JavaStreamingContext jssc) {
    logger.warn("stopping the context");
    jssc.stop(false, jssc.sparkContext().getConf().getBoolean("spark.streaming.stopGracefullyOnShutdown", false));
    logger.error("Context was stopped");
}

In config :

spark.streaming.stopGracefullyOnShutdown false

I think that with your code it should look like this:

JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, streamBatch);
JavaInputDStream<KafkaDTO> stream = KafkaUtils.createDirectStream( jssc, ...);

    stream.foreachRDD(new Function<JavaRDD<KafkaDTO>, Void>() {

        public Void call(JavaRDD<KafkaDTO> rdd) throws PropertiesLoadException, Exception {

            try {

                rdd.foreachPartition(new VoidFunction<Iterator<KafkaDTO>>() {

                    @Override
                    public void call(Iterator<KafkaDTO> itr) throws PropertiesLoadException, Exception {
                        while (itr.hasNext()) {
                            KafkaDTO dto = itr.next();
                            try {
                                //process the message here.
                            } catch (PropertiesLoadException e) {
                                // throw Exception if property file is not found
                                throw new PropertiesLoadException(" PropertiesLoadException: " + e.getMessage());
                            } catch (Exception e) {
                                throw new Exception(" Exception : " + e.getMessage());
                            }
                        }
                    }
                });

            } catch (Exception e){
                logger.error("Failed on write - will stop spark context immediately!!" + e.getMessage());
                closeContext(jssc);
                if (e instanceof InterruptedException) {
                    Thread.currentThread().interrupt();
                }
                throw e;
            }

        }
    }

In addition please note that my stream is working on spark 2.1 Standalone (not yarn / mesos) client mode. In addition I implement the stop gracefully my self using ZK.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM