简体   繁体   English

设置Spark输出文件的S3输出文件被许可人

[英]Setting S3 output file grantees for spark output files

I'm running Spark on AWS EMR and I'm having some issues getting the correct permissions on the output files ( rdd.saveAsTextFile('<file_dir_name>') ). 我在AWS EMR上运行Spark,但在获取对输出文件( rdd.saveAsTextFile('<file_dir_name>') )的正确权限时遇到一些问题。 In hive, I would add a line in the beginning with set fs.s3.canned.acl=BucketOwnerFullControl and that would set the correct permissions. 在配置单元中,我将在开头以set fs.s3.canned.acl=BucketOwnerFullControl添加一行,这将设置正确的权限。 For Spark, I tried running: 对于Spark,我尝试运行:

hadoop jar /mnt/var/lib/hadoop/steps/s-3HIRLHJJXV3SJ/script-runner.jar \
/home/hadoop/spark/bin/spark-submit --deploy-mode cluster --master yarn-cluster \
--conf "spark.driver.extraJavaOptions -Dfs.s3.canned.acl=BucketOwnerFullControl" \ 
hdfs:///user/hadoop/spark.py

But the permissions do not get set properly on the output files. 但是权限没有在输出文件上正确设置。 What is the proper way to pass in the 'fs.s3.canned.acl=BucketOwnerFullControl' or any of the S3 canned permissions to the spark job? 传递'fs.s3.canned.acl = BucketOwnerFullControl'或任何S3罐头许可到火花作业的正确方法是什么?

Thanks in advance 提前致谢

I found the solution. 我找到了解决方案。 In the job, you have to access the JavaSparkContext and from there get the Hadoop configuration and set the parameter there. 在作业中,您必须访问JavaSparkContext并从那里获取Hadoop配置并在那里设置参数。 For example: 例如:

sc._jsc.hadoopConfiguration().set('fs.s3.canned.acl','BucketOwnerFullControl')

The proper way to pass hadoop config keys in spark is to use --conf with keys prefixed with spark.hadoop. 在spark中传递hadoop配置键的正确方法是使用--conf和以spark.hadoop.为前缀的键spark.hadoop. . Your command would look like 您的命令看起来像

hadoop jar /mnt/var/lib/hadoop/steps/s-3HIRLHJJXV3SJ/script-runner.jar \
/home/hadoop/spark/bin/spark-submit --deploy-mode cluster --master yarn-cluster \
--conf "spark.hadoop.fs.s3.canned.acl=BucketOwnerFullControl" \ 
hdfs:///user/hadoop/spark.py

Unfortunately I cannot find any reference in official documentation of spark. 不幸的是,我在spark的官方文档中找不到任何参考。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM