简体   繁体   中英

Use Of experiments=no_use_multiple_sdk_containers in Google cloud dataflow

Issue Summary: Hi, I am using avro version 1.11.0 for parsing an avro file and decoding it. We have a custom requirement, so i am not able to use ReadFromAvro. When trying this with dataflow there arises a dependency issues as avro-python3 with version 1.82 is already available. The issue is of class TimestampMillisSchema which is not present in avro-python3. It fails stating Attribute TimestampMillisSchema not found in avro.schema. I then tried passing a requirements file with avro==1.11.0 but now the dataflow was not able to start giving error "Error syncing pod" which seems to be because of dependencies conflicts.

To Solve the issue, we set an experiment flag (--experiments=no_use_multiple_sdk_containers ) which ran fine.

I want to know a better solution of my issue and also does the above flag will effect the pipeline performance.

Please try with the dataflow run command:

--prebuild_sdk_container_engine=cloud_build --experiments=use_runner_v2

this would use cloud build to build the container with your extra dependencies and then would use it within the dataflow run.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM