简体   繁体   中英

Exception is not caught by the try catch block

I am saving DStream to Cassandra. There is a column in Cassandra with map<text, text> datatype. Cassandra does not support null value in Map, but null value can occur in the stream.

I have added try catch if case something goes wrong, but the program stopped despite that and I don't see error message in the log:

   try {
      cassandraStream.saveToCassandra("table", "keyspace")
    } catch {
      case e: Exception => log.error("Error in saving data in Cassandra" + e.getMessage, e)
    }

Exception

Caused by: java.lang.NullPointerException: Map values cannot be null
    at com.datastax.driver.core.TypeCodec$AbstractMapCodec.serialize(TypeCodec.java:2026)
    at com.datastax.driver.core.TypeCodec$AbstractMapCodec.serialize(TypeCodec.java:1909)
    at com.datastax.driver.core.AbstractData.set(AbstractData.java:530)
    at com.datastax.driver.core.AbstractData.set(AbstractData.java:536)
    at com.datastax.driver.core.BoundStatement.set(BoundStatement.java:870)
    at com.datastax.spark.connector.writer.BoundStatementBuilder.com$datastax$spark$connector$writer$BoundStatementBuilder$$bindColumnUnset(BoundStatementBuilder.scala:73)
    at com.datastax.spark.connector.writer.BoundStatementBuilder$$anonfun$6.apply(BoundStatementBuilder.scala:84)
    at com.datastax.spark.connector.writer.BoundStatementBuilder$$anonfun$6.apply(BoundStatementBuilder.scala:84)
    at com.datastax.spark.connector.writer.BoundStatementBuilder$$anonfun$bind$1.apply$mcVI$sp(BoundStatementBuilder.scala:106)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
    at com.datastax.spark.connector.writer.BoundStatementBuilder.bind(BoundStatementBuilder.scala:101)
    at com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.scala:106)
    at com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.scala:31)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at com.datastax.spark.connector.writer.GroupingBatchBuilder.foreach(GroupingBatchBuilder.scala:31)
    at com.datastax.spark.connector.writer.TableWriter$$anonfun$writeInternal$1.apply(TableWriter.scala:233)
    at com.datastax.spark.connector.writer.TableWriter$$anonfun$writeInternal$1.apply(TableWriter.scala:210)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:112)
    at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:111)
    at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:145)
    at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111)
    at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:210)
    at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:197)
    at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:183)
    at com.datastax.spark.connector.streaming.DStreamFunctions$$anonfun$saveToCassandra$1$$anonfun$apply$1.apply(DStreamFunctions.scala:54)
    at com.datastax.spark.connector.streaming.DStreamFunctions$$anonfun$saveToCassandra$1$$anonfun$apply$1.apply(DStreamFunctions.scala:54)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    ... 3 more

I'd like to know why the program got stops, despite the try/catch block. Why is the exception not caught?

To understand the source of the failure you have to acknowledge that DStreamFunctions.saveToCassandra , same as DStream output operations in general, is not an action in strict sense. In practice it just invokes foreachRDD :

 dstream.foreachRDD(rdd => rdd.sparkContext.runJob(rdd, writer.write _)) 

which in turn :

Apply a function to each RDD in this DStream. This is an output operator, so 'this' DStream will be registered as an output stream and therefore materialized.

The difference is subtle, but important - the operation is registered but the actual execution happens in different context, at later point in time.

It means there are no runtime failures to caught at the point you invoke saveToCassandra .

As already pointed out, try or Try would contain the driver exception, if applied directly on an action. So you'd for example re-implement saveToCassandra as

dstream.foreachRDD(rdd => try { 
  rdd.sparkContext.runJob(rdd, writer.write _) 
} catch {
  case e: Exception => log.error("Error in saving data in Cassandra" + e. getMessage, e)
})

the stream should be able to proceed, although the current batch will be completely or partially lost.

It is important to note that this is not the same as catching the original exception, which will be thrown, uncaught and visible in the log. To catch problem at its source you'd have to apply try / catch block directly in writer, and this is obviously not an option when you execute code, over which you don't have control.

Take away message is (already stated in this thread) - make sure to sanitize your data to avoid known sources of failure.

The problem is that you don't catch the exception you think you do. The code you have will catch a driver exception, and in fact code structured like this will do it.

It doesn't however mean that

the program should never stop.

While driver failure, which would be a consequence of fatal executor failure, is contained and driver can exit gracefully, stream as such is already gone. Therefore your code exits, because there is no more stream to run.

If the code in question was under your control, exception handling should be delegated to the task, but in case of 3rd party code, there is no such option.

Instead you should validate your data, and remove problematic records, before these are passed to saveToCassandra .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM