Save dataframe as AVRO Spark 2.4.0

Question

Since Spark 2.4.0 it's possible to save as AVRO without external jars. However I can't get it working at all. My code looks like this:

key = 'filename.avro'
df.write.mode('overwrite').format("avro").save(key)

I get the following error:

pyspark.sql.utils.AnalysisException: 'Failed to find data source: avro. Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of "Apache Avro Data Source Guide".;'

So I look at the Apache Avro Data Source Guide ( https://spark.apache.org/docs/latest/sql-data-sources-avro.html ) and it gives the following example:

df=spark.read.format("avro").load("examples/src/main/resources/users.avro")

df.select("name","favorite_color").write.format("avro").save("namesAndFavColors.avro")

It is the same, so I'm lost.. Anyone have an idea what goes wrong?

Answer 1

The documentation you've linked clearly says that:

The spark-avro module is external and not included in spark-submit or spark-shell by default.

and further explains how to include the package.

So your statement:

Since Spark 2.4.0 it's possible to save as AVRO without external jars. H

is just incorrect.

Answer 2

The spark-avro module is external and not included in spark-submit or spark-shell by default.

As with any Spark applications, spark-submit is used to launch your application. spark-avro_2.11 and its dependencies can be directly added to spark-submit using --packages , such as,

./bin/spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.0 ...

For experimenting on spark-shell, you can also use --packages to add org.apache.spark:spark-avro_2.11 and its dependencies directly,

./bin/spark-shell --packages org.apache.spark:spark-avro_2.11:2.4.0 ...

Save dataframe as AVRO Spark 2.4.0

Question

2 answers

solution1
0 2018-11-27 20:19:41

solution2
0 2018-11-27 20:47:01

Save dataframe as AVRO Spark 2.4.0

Question

2 answers

solution1 0 2018-11-27 20:19:41

solution2 0 2018-11-27 20:47:01

solution1
0 2018-11-27 20:19:41

solution2
0 2018-11-27 20:47:01