简体   繁体   中英

Passing AWS credentials to Google Cloud Dataflow, Python

I use Google Cloud Dataflow implementation in Python on Google Cloud Platform. My idea is to use input from AWS S3.

Google Cloud Dataflow (which is based on Apache Beam) supports reading files from S3. However, I cannot find in documentation the best possiblity to pass credentials to a job. I tried adding AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to environment variables within setup.py file. However, it work locally, but when I package Cloud Dataflow job as a template and trigger it to run on GCP, it sometimes work, and sometimes not, raising "NoCredentialsError" exception and causing job to fail.

Is there any coherent, best-practice solution to pass AWS credentials to Python Google Cloud Dataflow job on GCP?

The options to configure this have been added finally . They are available on Beam versions after 2.26.0.

The pipeline options are --s3_access_key_id and --s3_secret_access_key .


Unfortunately, the Beam 2.25.0 release and earlier don't have a good way of doing this, other than the following:

In this thread a user figured out how to do it in the setup.py file that they provide to Dataflow in their pipeline.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM