简体   繁体   中英

getting InvocationTargetException when running my glue job

I am trying to understand why the following error occurs.

"Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.SparkSession. java.lang.reflect.InvocationTargetException"

Basically, I am trying to use delta module to perform "upsert" method on my table in a glue job.

when I run the following code, I get the error mentioned above.

from delta import *
from pyspark.sql.session import SparkSession
spark = SparkSession \
          .builder \
          .config("spark.jars.packages", "io.delta:delta-core_2.11:0.5.0")\
          .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") \
          .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") \
          .getOrCreate()```

This is the only piece I run and get the error. Do you have any ideas why this is happening? 
 

Most probably, you are using the wrong version, probably Glue3.0. There were some workarounds to use Delta with Glue2.0 but those might give that kind of error when you try them with Glue3.0. Also setting the spark session config inside does not work for some parameters and it depends on the version I guess.

But no worries, AWS announced the 4th version of Glue, here is the official announcement .

Here is the official guide on using Delta with Glue, and below I will state the key points to make it work.

The first and the most tricky part is giving the configuration for delta. You can now do it the way you do in Glue4.0. In the older versions, you did this by sending the conf parameter inside the conf parameter through the job parameters of Glue:)

--conf = spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog

You have to set the --datalake-format parameter in job params as delta .

After that, make sure you selected Glue4.0. Also, make sure to handle symlink manifest files in your scripts or using crawlers .

If you want more flexibility you can also choose to use the EMR service of AWS, here is a walkthrough on using Delta there.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM