org.apache.spark.SparkException: Writing job aborted on Databricks

Question

I have used Databricks to ingest data from Event Hub and process it in real time with Pyspark Streaming. The code is working fine, but after this line:

df.writeStream.trigger(processingTime='100 seconds').queryName("myquery")\
  .format("console").outputMode('complete').start()

I'm getting the following error:

org.apache.spark.SparkException: Writing job aborted.
Caused by: java.io.InvalidClassException: org.apache.spark.eventhubs.rdd.EventHubsRDD; local class incompatible: stream classdesc

I have read that this could be due to low processing power, but I am using a Standard_F4 machine, standard cluster mode with autoscaling enabled.

Any ideas?

Answer 1

This looks like a JAR issue. Go to your JAR's folder in spark and check if you have multiple jars for azure-eventhubs-spark_XXX.XX. I think you've downloaded different versions of it and placed it there, you should remove any JAR with that name from your collection. This error may also occur if your JAR version is incompatible with other JAR's. Try adding spark jars using spark config.

spark = SparkSession \
            .builder \
            .appName('my-spark') \
            .config('spark.jars.packages', 'com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.12') \
            .getOrCreate()

This way spark will download JAR files through maven.

org.apache.spark.SparkException: Writing job aborted on Databricks

Question

1 answers

solution1
0 2021-12-26 22:10:25

org.apache.spark.SparkException: Writing job aborted on Databricks

Question

1 answers

solution1 0 2021-12-26 22:10:25

solution1
0 2021-12-26 22:10:25