简体   繁体   English

使用带有spark scala app的aws凭证配置文件

[英]Using aws credentials profiles with spark scala app

I would like to be able to use the ~/.aws/credentials file I maintain with different profiles with my spark scala application if that is possible. 如果可能的话,我希望能够使用我的spark scala应用程序使用不同的配置文件维护的〜/ .aws / credentials文件。 I know how to set hadoop configurations for s3a inside my app but I don't want to keep using different keys hardcoded and would rather just use my credentials file as I do with different programs. 我知道如何在我的应用程序中为s3a设置hadoop配置,但我不想继续使用硬编码的不同密钥,而宁愿使用我的凭证文件,就像我使用不同的程序一样。 I've also experimented with using java api such as val credentials = new DefaultAWSCredentialsProviderChain().getCredentials() and then creating an s3 client but that doesn't allow me to use my keys when reading files from s3. 我还尝试使用java api,例如val credentials = new DefaultAWSCredentialsProviderChain().getCredentials()然后创建一个s3客户端,但是当从s3读取文件时,我不允许我使用我的密钥。 I also know that keys can go in core-site.xml when I run my app but how can I manage different keys and also how can I set it up with IntelliJ so that I can have different keys pulled in using different profiles? 我也知道,当我运行我的应用程序时,密钥可以进入core-site.xml但是我如何管理不同的密钥以及如何使用IntelliJ进行设置以便我可以使用不同的配置文件拉入不同的密钥?

DefaultAWSCredentialsProviderChain contains no providers by default. DefaultAWSCredentialsProviderChain默认不包含任何提供程序。 You need to add some, eg: 你需要添加一些,例如:

val awsCredentials = new AWSCredentialsProviderChain(new 
  auth.EnvironmentVariableCredentialsProvider(), new 
  auth.profile.ProfileCredentialsProvider(), new 
  auth.AWSCredentialsProvider())

You can use them with S3 client or, as you mention Spark: 您可以将它们与S3客户端一起使用,或者正如您提到的Spark:

hadoopConfig.set("fs.s3a.access.key", awsCredentials.getAWSAccessKeyId)
hadoopConfig.set("fs.s3a.secret.key", awsCredentials.getAWSSecretKey)

To switch between different AWS profiles you could then switch between profiles by setting the AWS_PROFILE environment variable. 要在不同的AWS配置文件之间切换,您可以通过设置AWS_PROFILE环境变量在配置文件之间切换。 Happy to expand on any particular point if needed. 如果需要,很高兴扩展任何特定点。

If you have the AWS_ env vars set, spark-submit will copy them over as the s3a secrets. 如果您设置了AWS_ env vars,则spark-submit会将它们复制为s3a机密。

If you want to set a provider chain for S3A, then you can provide a list of provider classes in the option fs.s3a.aws.credentials.provider , These will get created with a Configuration instance if present, otherwise the empty constructor is used. 如果要为S3A设置提供程序链,则可以在选项fs.s3a.aws.credentials.provider中提供提供程序类的列表。如果存在,将使用Configuration实例创建这些提供程序类,否则使用空构造函数。 The default list is: one to get secrets from the URI or config, one for env vars, and finally one for EC2 IAM secrets. 默认列表是:一个用于从URI或配置获取机密,一个用于env vars,最后一个用于EC2 IAM机密。 You can change them to existing ones (anonymous provider, session provider), or write your own...anything which implements com.amazonaws.auth.AWSCredentialsProvider is allowed. 您可以将它们更改为现有的(匿名提供程序,会话提供程序),或编写您自己的...允许实现com.amazonaws.auth.AWSCredentialsProvider任何内容。

You should be able to set fs.s3a.aws.credentials.provider to com.amazonaws.auth.profile.ProfileCredentialsProvider and have it picked up locally (maybe you'll need your own wrapper which extracts the profile name from the configuration passed in. This will work on any host which has your credentials...it won't work if you only have local secrets and want to submit work elsewhere. It's probably simplest to set environment variables and have spark-submit propagate them. 您应该能够将fs.s3a.aws.credentials.provider设置为com.amazonaws.auth.profile.ProfileCredentialsProvider并将其com.amazonaws.auth.profile.ProfileCredentialsProvider本地(可能您需要自己的包装器,从传递的配置中提取配置文件名称)这将适用于任何拥有您凭据的主机......如果您只有本地机密并希望在别处提交工作,它将无法工作。设置环境变量并让spark-submit传播它们可能最简单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM