简体   繁体   English

Spark 是否允许使用 Amazon Assumed Role 和 STS 临时凭证在 EMR 上进行 Glue 跨账户访问

[英]Does Spark allow to use Amazon Assumed Role and STS temporary credentials for Glue cross account access on EMR

We are trying to connect to the cross-account AWS Glue catalog with the EMR spark job.我们正在尝试使用 EMR spark 作业连接到跨账户 AWS Glue 目录。 I did a study that AWS supports cross-account access for the Glue catalog in two ways.我做了一项研究,AWS 以两种方式支持 Glue 目录的跨账户访问。

  1. IAM role-based. IAM基于角色。 (This is not working for me) (这对我不起作用)
  2. Resource-based policy.基于资源的政策。 (This worked for me) (这对我有用)

So the problem scenario is, Account A creates EMR with its role role_Account_A .所以问题场景是,账户A使用其角色role_Account_A创建EMR And role role_Account_A wants to access the glue catalog of Account B .角色role_Account_A想要访问账户 B 的胶水目录

  • Account A creates EMR cluster with role role_Account_A账户 A 创建角色为 role_Account_A的 EMR 集群
  • Account B has role role_Account_B which has access to glue and s3 with role_Account_A in trusted entities.帐户 B 具有角色role_Account_B ,它可以访问受信任实体中的 glue 和 s3 和role_Account_A
  • role_Account_A has sts:AssumeRole policy for resource role_Account_B role_Account_A具有资源role_Account_B的 sts:AssumeRole 策略
  • using sdk we are able to assume role role_Account_B from role_Account_A and getting temporary credentials.使用 sdk,我们能够从role_Account_A承担角色role_Account_B并获得临时凭证。
  • EMR has configurations[{"classification":"spark-hive-site","properties":{"hive.metastore.glue.catalogid":"Account_B", "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}] EMR 具有配置 [{"classification":"spark-hive-site","properties":{"hive.metastore.glue.catalogid":"Account_B", "hive.metastore.client.factory.class": "com .amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"}}]
    SparkSession sparkSession=SparkSession.builder().appName("testing glue")
                .enableHiveSupport()
                .getOrCreate();
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider");
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.access.key", assumedcreds.getAccessKeyId());
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.secret.key", assumedcreds.getSecretAccessKey());
sparkSession.sparkContext().hadoopConfiguration().set("fs.s3a.session.token", assumedcreds.getSessionToken());
sparkSession.sparkContext().conf().set("fs.s3a.access.key",  assumedcreds.getAccessKeyId());
sparkSession.sparkContext().conf().set("fs.s3a.secret.key",  assumedcreds.getSecretAccessKey());
sparkSession.sparkContext().conf().set("fs.s3a.session.token", assumedcreds.getSessionToken());
sparkSession.sql("show databases").show(10, false);

The error that we are getting is我们得到的错误是

    Caused by: MetaException(message:User: arn:aws:sts::Account_A:assumed-role/role_Account_A/i-xxxxxxxxxxxx is not authorized to perform: glue:GetDatabase on resource: arn:aws:glue:XX-XXXX-X:Account_B:catalog 
because no resource-based policy allows the glue:GetDatabase action (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: X93Xbc64-0153-XXXX-XXX-XXXXXXX))

Questions:-问题:-

  • Does spark supports glue-based authentication properties for example aws.glue.access.key? spark 是否支持基于胶水的身份验证属性,例如 aws.glue.access.key?
  • As per error spark is not using assumed role role_Account_B.根据错误,spark 未使用代入角色 role_Account_B。 It uses role_Account_A with which EMR was created.它使用创建 EMR 的 role_Account_A。 Can we make it use assumed role role_Account_B?我们可以让它使用代入角色 role_Account_B 吗?

I will update the question details if I am missing something.如果我遗漏了什么,我会更新问题的详细信息。

Did you find the solution?你找到解决方案了吗? I'm facing the exact same issue !我面临着完全相同的问题! Thank you谢谢

I believe you're having an EMR instance profile role in Account A. If so, you would have to follow these and cross-account access should work我相信您在账户 A 中拥有 EMR 实例配置文件角色。如果是这样,您将必须遵循这些并且跨账户访问应该有效

In Account B,在帐户 B 中,

  1. Under Glue, go to settings and add the ( EMR instance profile role A ) as principal and provide access to Account B's glue and S3.在 Glue 下,转到设置并将(EMR 实例配置文件角色 A)添加为主体,并提供对帐户 B 的胶水和 S3 的访问权限。 It is recommended to provide only for the buckets you need to access建议只为你需要访问的buckets提供
  2. Go to the bucket policy of the bucket that the glue table will be using and add the ( EMR instance profile role A ) as principal and provide read/write access.转到粘合表将使用的存储桶的存储桶策略,并将(EMR 实例配置文件角色 A)添加为委托人并提供读/写访问权限。

Now if you run the EMR job in account A, you'll see the job running with cross-account access现在,如果您在账户 A 中运行 EMR 作业,您将看到该作业使用跨账户访问权限运行

It works for our purpose.它适用于我们的目的。 Try it out试试看

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM