简体   繁体   中英

Google Cloud Dataflow - providing an sdk_location in pipeline options

background:

I have a web app that calls google dataflow and recently wanted to use the sdk_location parameter in the pipeline options.
I downloaded the apache-beam sdk and uploaded this to a gcs bucket as a tar.gz file.
I then added sdk_location={location of the.tar.gz file}

However when I now make the dataflow API call, I get the following error in the worker startup logs.
Failed to install worker package.

Has anyone else had this issue?
And is that the correct tarball to have provided?

When using Dataflow, you don't have to use the sdk_location option unless you make changes to the Beam SDK itself.

If it fails to install worker package, check whether you are missing necessary dependencies to run your job. For example, if you are using the Python SDK: https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM