Set spark configuration in aws glue pyspark

Question

I am using AWS Glue with pySpark and want to add a couple of configurations in the sparkSession, eg '"spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem" , spark.hadoop.fs.s3a.multiobjectdelete.enable","false" , "spark.serializer", "org.apache.spark.serializer.KryoSerializer" , "spark.hadoop.fs.s3a.fast.upload","true" . The code I am using to initialise the context is the following:

glueContext = GlueContext(SparkContext.getOrCreate())
spark = glueContext.spark_session

From what I understood from the documentation is that I should add these confs as job parameters when submitting the glue jobs. Is that the case or can they also be added when initializing the spark?

Answer 1

This doesn't seem to be erroring out - not sure if it's working

hadoop_conf = spark.sparkContext._jsc.hadoopConfiguration()
hadoop_conf.set("spark.hadoop.fs.s3.maxRetries", "20")
hadoop_conf.set("spark.hadoop.fs.s3.consistent.retryPolicyType", "exponential")

Set spark configuration in aws glue pyspark

Question

1 answers

solution1
0 2020-11-03 16:28:52

Set spark configuration in aws glue pyspark

Question

1 answers

solution1 0 2020-11-03 16:28:52

solution1
0 2020-11-03 16:28:52