简体   繁体   English

如何在EMR实例中正确提供spark-redshift的凭据?

[英]How to properly provide credentials for spark-redshift in EMR instances?

We were trying to use the spark-redshift project, following the 3rd recommendation for providing the credentials. 我们尝试使用spark-redshift项目,遵循提供凭据的第3条建议。 Namely: 即:

IAM instance profiles: If you are running on EC2 and authenticate to S3 using IAM and instance profiles, then you must must configure the temporary_aws_access_key_id, temporary_aws_secret_access_key, and temporary_aws_session_token configuration properties to point to temporary keys created via the AWS Security Token Service. IAM实例配置文件:如果您在EC2上运行并使用IAM和实例配置文件对S3进行身份验证,则必须配置temporary_aws_access_key_id,temporary_aws_secret_access_key和temporary_aws_session_token配置属性以指向通过AWS Security Token Service创建的临时密钥。 These temporary keys will then be passed to Redshift via LOAD and UNLOAD commands. 然后,这些临时密钥将通过LOAD和UNLOAD命令传递给Redshift。

Our Spark application is running from an EMR cluster. 我们的Spark应用程序从EMR集群运行。 For such purpose, we tried to obtain temporary credentials from inside instances of this node calling getSessionToken like this: 出于这个目的,我们尝试从此节点的内部实例中获取临时凭证,调用getSessionToken如下所示:

val stsClient = new AWSSecurityTokenServiceClient(new InstanceProfileCredentialsProvider())        
val getSessionTokenRequest = new GetSessionTokenRequest()
val sessionTokenResult =  stsClient.getSessionToken(getSessionTokenRequest);
val sessionCredentials = sessionTokenResult.getCredentials()

But this throws 403 Access Denied , even if the policy with sts:getSessionToken is applied to the role of the instances of EMR. 但是,即使具有sts:getSessionToken的策略应用于EMR实例的角色,也会抛出403 Access Denied

Then we tried the following two alternatives. 然后我们尝试了以下两种选择。 First, using the AssumeRole policy: 首先,使用AssumeRole策略:

val p = new STSAssumeRoleSessionCredentialsProvider("arn:aws:iam::123456798123:role/My_EMR_Role", "session_name")
val credentials: AWSSessionCredentials = p.getCredentials
val token = credentials.getSessionToken

and second, casting the result from InstanceProfileCredentialsProvider : 第二,从InstanceProfileCredentialsProvider转换结果:

val provider = new InstanceProfileCredentialsProvider()
val credentials: AWSSessionCredentials = provider.getCredentials.asInstanceOf[AWSSessionCredentials]
val token = credentials.getSessionToken

They both work, but which is the expected way of doing this? 他们都工作,但这是预期的方式吗? Is there something terribly wrong about casting the result or adding the AssumeRole policy? 是否有关于转换结果或添加AssumeRole策略的错误?

Thanks! 谢谢!

The GetSessionToken API is meant to be called by IAM users, as said in their docs: GetSessionToken API旨在由IAM用户调用,如他们的文档中所述:

Returns a set of temporary credentials for an AWS account or IAM user. 返回AWS账户或IAM用户的一组临时凭证。

On your first example, you are calling the API using your EMR instance role, which is an IAM role (some of the differences are explained here ). 在第一个示例中,您使用EMR实例角色调用API,这是一个IAM角色( 这里解释一些差异)。 In this specific case, the EMR instance role credentials are session credentials obtained by EMR on behalf of your instance. 在此特定情况下,EMR实例角色凭据是EMR代表您的实例获取的会话凭据。

What's the specific wording on your error? 您的错误的具体措辞是什么? If it is Cannot call GetSessionToken with session credentials , that would confirm all of the above. 如果它Cannot call GetSessionToken with session credentials ,那将确认以上所有内容。

When you cast your instance role to a session token, it works, because as explained above, it turns out that an assumed role's credentials are session credentials, so it just works. 当您将实例角色转换为会话令牌时,它会起作用,因为如上所述,假设角色的凭据会话凭据,因此它才有效。

There is nothing wrong with calling AssumeRole explicitly. 显式调用AssumeRole没有错。 This is exactly what the EMR service does under the hood. 这正是EMR服务所做的。 There is also nothing wrong with casting your results to session credentials, as they're pretty much guaranteed to be session credentials on your use case. 将结果转换为会话凭据也没有任何问题,因为它们几乎可以保证成为用例的会话凭据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM