[英]AWS Glue Job getting Access Denied when writing to S3
I have a Glue ETL job, created by CloudFormation.我有一个由 CloudFormation 创建的 Glue ETL 作业。 This job extracts data from RDS Aurora and write to S3.此作业从 RDS Aurora 中提取数据并写入 S3。
When I run this job, I get the error below.当我运行此作业时,我收到以下错误。
The job has an IAM service role.该作业具有 IAM 服务角色。
This service role allows此服务角色允许
I have the same error whether the S3 bucket is encrypted with either AES256 or aws:kms.无论 S3 存储桶是使用 AES256 还是 aws:kms 加密的,我都会遇到同样的错误。
I get the same error whether the job has a Security Configuration or not.无论作业是否具有安全配置,我都会收到相同的错误。
I have a job doing the exactly same thing that I created manually and can run successfully without a Security Configuration.我的工作与我手动创建的完全相同,并且可以在没有安全配置的情况下成功运行。
What am I missing?我错过了什么? Here's the full error log这是完整的错误日志
"/mnt/yarn/usercache/root/appcache/application_1...5_0002/container_15...45_0002_01_000001/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o145.pyWriteDynamicFrame. “/mnt/yarn/usercache/root/appcache/application_1...5_0002/container_15...45_0002_01_000001/py4j-0.10.4-src.zip/py4j/protocol.py”,第 319 行,在 get_return_value py4j.protocol 中。 Py4JJavaError:调用 o145.pyWriteDynamicFrame 时出错。 : org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 2.0 failed 4 times, most recent failure: Lost task 3.3 in stage 2.0 (TID 30, ip-10-....us-west-2.compute.internal, executor 1): com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: F...49), S3 Extended Request ID: eo...wXZw= at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588 :org.apache.spark.SparkException:作业因阶段故障而中止:阶段2.0中的任务3失败4次,最近失败:阶段2.0中丢失任务3.3(TID 30,ip-10-..us-west -2.compute.internal, executor 1): com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: F...49), S3 Extended Request ID: eo...wXZw= at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor .handleErrorResponse(AmazonHttpClient.java:1588
Unfortunately the error doesn't tell us much except that it's failing during the write of your DynamicFrame.不幸的是,该错误并没有告诉我们太多,只是在写入 DynamicFrame 期间它失败了。
There is only a handful of possible reasons for the 403, you can check if you have met them all: 403 的可能原因只有少数,您可以检查您是否都遇到了它们:
If none of the above works, you can shed some more light with regards to your setup.如果上述方法均无效,您可以进一步了解您的设置。 Perhaps the code for write-operation.也许是写操作的代码。
In addition to Lydon's answer, error 403 is also received if your Data Source location is the same as the Data Target;除了 Lydon 的回答,如果您的数据源位置与数据目标相同,也会收到错误 403; defined when creating a Job in Glue.在 Glue 中创建作业时定义。 Change either of these if they are identical and the issue will be resolved.如果它们相同,请更改其中任何一个,问题将得到解决。
How are you providing permission for PassRole
to glue role?您如何为PassRole
提供粘合角色的权限?
{
"Sid": "AllowAccessToRoleOnly",
"Effect": "Allow",
"Action": [
"iam:PassRole",
"iam:GetRole",
"iam:GetRolePolicy",
"iam:ListRolePolicies",
"iam:ListAttachedRolePolicies"
],
"Resource": "arn:aws:iam::*:role/<role>"
}
Usually we create roles using <project>-<role>-<env>
eg xyz-glue-dev where project name is xyz and env is dev.通常我们使用<project>-<role>-<env>
创建角色,例如 xyz-glue-dev,其中项目名称是 xyz,env 是 dev。 In that case we use "Resource": "arn:aws:iam::*:role/xyz-*-dev"
在这种情况下,我们使用"Resource": "arn:aws:iam::*:role/xyz-*-dev"
For me it was two things.对我来说,这是两件事。
After these two settings, my glue job ran successfully.经过这两个设置,我的粘合作业成功运行。 Hope this helps.希望这可以帮助。
You should add a Security configurations(mentioned under Secuity tab on Glue Console).您应该添加一个安全配置(在 Glue 控制台上的安全选项卡下提到)。 providing S3 Encryption mode either SSE-KMS or SSE-S3.提供 S3 加密模式 SSE-KMS 或 SSE-S3。
Now select the above security configuration while creating your job under Advance Properties.现在 select 在高级属性下创建作业时进行上述安全配置。
Duly verify you IAM role & S3 bucket policy.适当地验证您的 IAM 角色和 S3 存储桶策略。 It will work它会工作
Have you done the required RDS Parameter group configurations, I dont see any reference to this in your question? 您是否已完成所需的RDS参数组配置,但在您的问题中看不到对此的任何引用? I am assumiing you have missed certain configurations in the RDS Parameter groups to read/write from S3 & RDS. 我很高兴您错过了RDS参数组中的某些配置,无法从S3和RDS进行读取/写入。 If this is not done, please refer to this link and do required configurations. 如果不这样做,请参考此链接并进行所需的配置。 This should work. 这应该工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.