在谷歌云数据流中使用 experiments=no_use_multiple_sdk_containers

Question

Issue Summary: Hi, I am using avro version 1.11.0 for parsing an avro file and decoding it.问题摘要：您好，我正在使用 avro 1.11.0 版来解析 avro 文件并对其进行解码。 We have a custom requirement, so i am not able to use ReadFromAvro.我们有自定义要求，所以我无法使用 ReadFromAvro。 When trying this with dataflow there arises a dependency issues as avro-python3 with version 1.82 is already available.当尝试使用数据流进行此操作时，会出现依赖性问题，因为版本 1.82 的 avro-python3 已经可用。 The issue is of class TimestampMillisSchema which is not present in avro-python3.问题是 class TimestampMillisSchema，它在 avro-python3 中不存在。 It fails stating Attribute TimestampMillisSchema not found in avro.schema.它无法说明在 avro.schema 中找不到属性 TimestampMillisSchema。 I then tried passing a requirements file with avro==1.11.0 but now the dataflow was not able to start giving error "Error syncing pod" which seems to be because of dependencies conflicts.然后我尝试使用 avro==1.11.0 传递一个需求文件，但现在数据流无法开始给出错误“Error syncing pod”，这似乎是因为依赖项冲突。

To Solve the issue, we set an experiment flag (--experiments=no_use_multiple_sdk_containers ) which ran fine.为了解决这个问题，我们设置了一个运行良好的实验标志（--experiments=no_use_multiple_sdk_containers）。

I want to know a better solution of my issue and also does the above flag will effect the pipeline performance.我想知道我的问题的更好解决方案，并且上面的标志是否会影响管道性能。

Answer 1

Please try with the dataflow run command:请尝试使用数据流运行命令：

--prebuild_sdk_container_engine=cloud_build --experiments=use_runner_v2

this would use cloud build to build the container with your extra dependencies and then would use it within the dataflow run.这将使用云构建来构建具有额外依赖项的容器，然后在数据流运行中使用它。

在谷歌云数据流中使用 experiments=no_use_multiple_sdk_containers

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-04-29 15:21:27

在谷歌云数据流中使用 experiments=no_use_multiple_sdk_containers

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-04-29 15:21:27

解决方案1
2 已采纳 2022-04-29 15:21:27