I am using AWS Glue with pySpark and want to add a couple of configurations in the sparkSession, eg '"spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem"
, spark.hadoop.fs.s3a.multiobjectdelete.enable","false"
, "spark.serializer", "org.apache.spark.serializer.KryoSerializer"
, "spark.hadoop.fs.s3a.fast.upload","true"
. The code I am using to initialise the context is the following:
glueContext = GlueContext(SparkContext.getOrCreate())
spark = glueContext.spark_session
From what I understood from the documentation is that I should add these confs as job parameters when submitting the glue jobs. Is that the case or can they also be added when initializing the spark?
This doesn't seem to be erroring out - not sure if it's working
hadoop_conf = spark.sparkContext._jsc.hadoopConfiguration()
hadoop_conf.set("spark.hadoop.fs.s3.maxRetries", "20")
hadoop_conf.set("spark.hadoop.fs.s3.consistent.retryPolicyType", "exponential")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.