简体   繁体   English

将 AWS 凭证传递给 Google Cloud Dataflow,Python

[英]Passing AWS credentials to Google Cloud Dataflow, Python

I use Google Cloud Dataflow implementation in Python on Google Cloud Platform.我在 Google Cloud Platform 上的 Python 中使用 Google Cloud Dataflow 实现。 My idea is to use input from AWS S3.我的想法是使用来自 AWS S3 的输入。

Google Cloud Dataflow (which is based on Apache Beam) supports reading files from S3. Google Cloud Dataflow(基于 Apache Beam)支持从 S3 读取文件。 However, I cannot find in documentation the best possiblity to pass credentials to a job.但是,我在文档中找不到将凭证传递给工作的最佳可能性。 I tried adding AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to environment variables within setup.py file.我尝试将AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEY添加到setup.py文件中的环境变量。 However, it work locally, but when I package Cloud Dataflow job as a template and trigger it to run on GCP, it sometimes work, and sometimes not, raising "NoCredentialsError" exception and causing job to fail.但是,它在本地工作,但是当我将 package Cloud Dataflow 作业作为模板并触发它在 GCP 上运行时,它有时工作,有时不工作,引发“NoCredentialsError”异常并导致工作失败。

Is there any coherent, best-practice solution to pass AWS credentials to Python Google Cloud Dataflow job on GCP?是否有任何一致的最佳实践解决方案可以将 AWS 凭证传递给 GCP 上的 Python Google Cloud Dataflow 作业?

The options to configure this have been added finally .最后添加了配置此选项的选项。 They are available on Beam versions after 2.26.0.它们在 2.26.0 之后的 Beam 版本中可用。

The pipeline options are --s3_access_key_id and --s3_secret_access_key .管道选项是--s3_access_key_id--s3_secret_access_key


Unfortunately, the Beam 2.25.0 release and earlier don't have a good way of doing this, other than the following:不幸的是,Beam 2.25.0 和更早的版本没有这样做的好方法,除了以下:

In this thread a user figured out how to do it in the setup.py file that they provide to Dataflow in their pipeline.此线程中,用户在他们在管道中提供给 Dataflow 的setup.py文件中弄清楚了如何执行此操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM