[英]Does Spark allow to use Amazon Assumed Role and STS temporary credentials for Glue cross account access on EMR
We are trying to connect to the cross-account AWS Glue catalog with the EMR spark job.我们正在尝试使用 EMR spark 作业连接到跨账户 AWS Glue 目录。 I did a study that AWS supports cross-account access for the Glue catalog in two ways.我做了一项研究,AWS 以两种方式支持 Glue 目录的跨账户访问。
So the problem scenario is, Account A creates EMR with its role role_Account_A .所以问题场景是,账户A使用其角色role_Account_A创建EMR 。 And role role_Account_A wants to access the glue catalog of Account B .角色role_Account_A想要访问账户 B 的胶水目录。
- Account A creates EMR cluster with role role_Account_A账户 A 创建角色为 role_Account_A的 EMR 集群
- Account B has role role_Account_B which has access to glue and s3 with role_Account_A in trusted entities.帐户 B 具有角色role_Account_B ,它可以访问受信任实体中的 glue 和 s3 和role_Account_A 。
- role_Account_A has sts:AssumeRole policy for resource role_Account_B role_Account_A具有资源role_Account_B的 sts:AssumeRole 策略
- using sdk we are able to assume role role_Account_B from role_Account_A and getting temporary credentials.使用 sdk,我们能够从role_Account_A承担角色role_Account_B并获得临时凭证。
- EMR has configurations[{"classification":"spark-hive-site","properties":{"hive.metastore.glue.catalogid":"Account_B", "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}] EMR 具有配置 [{"classification":"spark-hive-site","properties":{"hive.metastore.glue.catalogid":"Account_B", "hive.metastore.client.factory.class": "com .amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]
SparkSession sparkSession=SparkSession.builder().appName("testing glue")
.enableHiveSupport()
.getOrCreate();
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider");
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.access.key", assumedcreds.getAccessKeyId());
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.secret.key", assumedcreds.getSecretAccessKey());
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.session.token", assumedcreds.getSessionToken());
sparkSession.sparkContext().conf().set("fs.s3a.access.key", assumedcreds.getAccessKeyId());
sparkSession.sparkContext().conf().set("fs.s3a.secret.key", assumedcreds.getSecretAccessKey());
sparkSession.sparkContext().conf().set("fs.s3a.session.token", assumedcreds.getSessionToken());
sparkSession.sql("show databases").show(10, false);
The error that we are getting is我们得到的错误是
Caused by: MetaException(message:User: arn:aws:sts::Account_A:assumed-role/role_Account_A/i-xxxxxxxxxxxx is not authorized to perform: glue:GetDatabase on resource: arn:aws:glue:XX-XXXX-X:Account_B:catalog
because no resource-based policy allows the glue:GetDatabase action (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: X93Xbc64-0153-XXXX-XXX-XXXXXXX))
Questions:-问题:-
- Does spark supports glue-based authentication properties for example aws.glue.access.key? spark 是否支持基于胶水的身份验证属性,例如 aws.glue.access.key?
- As per error spark is not using assumed role role_Account_B.根据错误,spark 未使用代入角色 role_Account_B。 It uses role_Account_A with which EMR was created.它使用创建 EMR 的 role_Account_A。 Can we make it use assumed role role_Account_B?我们可以让它使用代入角色 role_Account_B 吗?
I will update the question details if I am missing something.如果我遗漏了什么,我会更新问题的详细信息。
Did you find the solution?你找到解决方案了吗? I'm facing the exact same issue !我面临着完全相同的问题! Thank you谢谢
I believe you're having an EMR instance profile role in Account A. If so, you would have to follow these and cross-account access should work我相信您在账户 A 中拥有 EMR 实例配置文件角色。如果是这样,您将必须遵循这些并且跨账户访问应该有效
In Account B,在帐户 B 中,
Now if you run the EMR job in account A, you'll see the job running with cross-account access现在,如果您在账户 A 中运行 EMR 作业,您将看到该作业使用跨账户访问权限运行
It works for our purpose.它适用于我们的目的。 Try it out试试看
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.