在 aws 胶水 pyspark 中设置火花配置

Question

I am using AWS Glue with pySpark and want to add a couple of configurations in the sparkSession, eg '"spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem" , spark.hadoop.fs.s3a.multiobjectdelete.enable","false" , "spark.serializer", "org.apache.spark.serializer.KryoSerializer" , "spark.hadoop.fs.s3a.fast.upload","true" . The code I am using to initialise the context is the following:我正在使用带有 pySpark 的 AWS Glue 并希望在 sparkSession 中添加几个配置，例如'"spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem" , spark.hadoop.fs.s3a.multiobjectdelete.enable","false" , "spark.serializer", "org.apache.spark.serializer.KryoSerializer" , "spark.hadoop.fs.s3a.fast.upload","true" 。我用来初始化上下文的代码如下：

glueContext = GlueContext(SparkContext.getOrCreate())
spark = glueContext.spark_session

From what I understood from the documentation is that I should add these confs as job parameters when submitting the glue jobs.从我从文档中了解到的是，在提交胶水作业时，我应该将这些 confs 添加为作业参数。 Is that the case or can they also be added when initializing the spark?是这种情况还是可以在初始化火花时添加它们？

Answer 1

This doesn't seem to be erroring out - not sure if it's working这似乎没有出错 - 不确定它是否有效

hadoop_conf = spark.sparkContext._jsc.hadoopConfiguration()
hadoop_conf.set("spark.hadoop.fs.s3.maxRetries", "20")
hadoop_conf.set("spark.hadoop.fs.s3.consistent.retryPolicyType", "exponential")

在 aws 胶水 pyspark 中设置火花配置

问题描述

1 个解决方案

解决方案1
0 2020-11-03 16:28:52

在 aws 胶水 pyspark 中设置火花配置

问题描述

1 个解决方案

解决方案1 0 2020-11-03 16:28:52

解决方案1
0 2020-11-03 16:28:52