简体繁体中英

Passing AWS credentials to Google Cloud Dataflow, Python

原文 2020-07-24 08:54:04 5 1 python/ amazon-web-services/ google-cloud-dataflow/ apache-beam

I use Google Cloud Dataflow implementation in Python on Google Cloud Platform. My idea is to use input from AWS S3.

Google Cloud Dataflow (which is based on Apache Beam) supports reading files from S3. However, I cannot find in documentation the best possiblity to pass credentials to a job. I tried adding AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to environment variables within setup.py file. However, it work locally, but when I package Cloud Dataflow job as a template and trigger it to run on GCP, it sometimes work, and sometimes not, raising "NoCredentialsError" exception and causing job to fail.

Is there any coherent, best-practice solution to pass AWS credentials to Python Google Cloud Dataflow job on GCP?

1 answers

The options to configure this have been added finally . They are available on Beam versions after 2.26.0.

The pipeline options are --s3_access_key_id and --s3_secret_access_key .

Unfortunately, the Beam 2.25.0 release and earlier don't have a good way of doing this, other than the following:

In this thread a user figured out how to do it in the setup.py file that they provide to Dataflow in their pipeline.

Google Cloud Dataflow with Python

Google Cloud Dataflow Python --maxNumWorkers

Passing AWS Credentials in Python Script

Schedule a Google Cloud Dataflow job in Python

Google Cloud Dataflow Python, Retrieving Job ID

Using CombinePerKey in Google Cloud Dataflow Python

Google Cloud Dataflow (Python) - Not installing dependencies correctly

Google Cloud Dataflow Python SDK updates

ImportError phonenumbers with google cloud dataflow python

Google Cloud Dataflow Dependencies

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Google Cloud Dataflow with Python Google Cloud Dataflow Python --maxNumWorkers Passing AWS Credentials in Python Script Schedule a Google Cloud Dataflow job in Python Google Cloud Dataflow Python, Retrieving Job ID Using CombinePerKey in Google Cloud Dataflow Python Google Cloud Dataflow (Python) - Not installing dependencies correctly Google Cloud Dataflow Python SDK updates ImportError phonenumbers with google cloud dataflow python Google Cloud Dataflow Dependencies

Related Tags

Passing AWS credentials to Google Cloud Dataflow, Python

Question

1 answers

solution1 2 2020-10-27 18:34:46

solution1
2 2020-10-27 18:34:46