[英]Use Of experiments=no_use_multiple_sdk_containers in Google cloud dataflow
Issue Summary: Hi, I am using avro version 1.11.0 for parsing an avro file and decoding it.问题摘要:您好,我正在使用 avro 1.11.0 版来解析 avro 文件并对其进行解码。 We have a custom requirement, so i am not able to use ReadFromAvro.
我们有自定义要求,所以我无法使用 ReadFromAvro。 When trying this with dataflow there arises a dependency issues as avro-python3 with version 1.82 is already available.
当尝试使用数据流进行此操作时,会出现依赖性问题,因为版本 1.82 的 avro-python3 已经可用。 The issue is of class TimestampMillisSchema which is not present in avro-python3.
问题是 class TimestampMillisSchema,它在 avro-python3 中不存在。 It fails stating Attribute TimestampMillisSchema not found in avro.schema.
它无法说明在 avro.schema 中找不到属性 TimestampMillisSchema。 I then tried passing a requirements file with avro==1.11.0 but now the dataflow was not able to start giving error "Error syncing pod" which seems to be because of dependencies conflicts.
然后我尝试使用 avro==1.11.0 传递一个需求文件,但现在数据流无法开始给出错误“Error syncing pod”,这似乎是因为依赖项冲突。
To Solve the issue, we set an experiment flag (--experiments=no_use_multiple_sdk_containers ) which ran fine.为了解决这个问题,我们设置了一个运行良好的实验标志(--experiments=no_use_multiple_sdk_containers)。
I want to know a better solution of my issue and also does the above flag will effect the pipeline performance.我想知道我的问题的更好解决方案,并且上面的标志是否会影响管道性能。
Please try with the dataflow run command:请尝试使用数据流运行命令:
--prebuild_sdk_container_engine=cloud_build --experiments=use_runner_v2
this would use cloud build to build the container with your extra dependencies and then would use it within the dataflow run.这将使用云构建来构建具有额外依赖项的容器,然后在数据流运行中使用它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.