简体   繁体   English

在谷歌云数据流中使用 experiments=no_use_multiple_sdk_containers

[英]Use Of experiments=no_use_multiple_sdk_containers in Google cloud dataflow

Issue Summary: Hi, I am using avro version 1.11.0 for parsing an avro file and decoding it.问题摘要:您好,我正在使用 avro 1.11.0 版来解析 avro 文件并对其进行解码。 We have a custom requirement, so i am not able to use ReadFromAvro.我们有自定义要求,所以我无法使用 ReadFromAvro。 When trying this with dataflow there arises a dependency issues as avro-python3 with version 1.82 is already available.当尝试使用数据流进行此操作时,会出现依赖性问题,因为版本 1.82 的 avro-python3 已经可用。 The issue is of class TimestampMillisSchema which is not present in avro-python3.问题是 class TimestampMillisSchema,它在 avro-python3 中不存在。 It fails stating Attribute TimestampMillisSchema not found in avro.schema.它无法说明在 avro.schema 中找不到属性 TimestampMillisSchema。 I then tried passing a requirements file with avro==1.11.0 but now the dataflow was not able to start giving error "Error syncing pod" which seems to be because of dependencies conflicts.然后我尝试使用 avro==1.11.0 传递一个需求文件,但现在数据流无法开始给出错误“Error syncing pod”,这似乎是因为依赖项冲突。

To Solve the issue, we set an experiment flag (--experiments=no_use_multiple_sdk_containers ) which ran fine.为了解决这个问题,我们设置了一个运行良好的实验标志(--experiments=no_use_multiple_sdk_containers)。

I want to know a better solution of my issue and also does the above flag will effect the pipeline performance.我想知道我的问题的更好解决方案,并且上面的标志是否会影响管道性能。

Please try with the dataflow run command:请尝试使用数据流运行命令:

--prebuild_sdk_container_engine=cloud_build --experiments=use_runner_v2

this would use cloud build to build the container with your extra dependencies and then would use it within the dataflow run.这将使用云构建来构建具有额外依赖项的容器,然后在数据流运行中使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 无法在 Google Cloud Dataflow 虚拟机中使用 ping 命令? - Can't use ping command in Google Cloud Dataflow vm? 使用 Google Cloud Dataflow flex 模板时,是否可以使用多命令 CLI 来运行作业? - When using Google Cloud Dataflow flex templates, is it possible to use a multi-command CLI to run a job? Google Cloud Dataflow - 在管道选项中提供 sdk_location - Google Cloud Dataflow - providing an sdk_location in pipeline options Google Cloud 承诺使用 - Google Cloud Committed Use 如何使用 GCP 云 SQL 作为数据流源和/或接收器与 Python? - How to use GCP Cloud SQL as Dataflow source and/or sink with Python? 我可以将 google DataFlow 与本机 python 一起使用吗? - Can I use google DataFlow with native python? 谷歌云数据流作业创建错误:“无法设置工作池区域。请检查worker_region实验标志是否有效” - Google cloud dataflow job creation error: "Cannot set worker pool zone. Please check whether the worker_region experiments flag is valid" 在谷歌数据流管道中构建容器 - Building containers within google dataflow pipeline 了解 Google Cloud DataFlow Worker 中的线程 - Understanding Threading in Google Cloud DataFlow Workers 关于谷歌云数据流权限的问题 - Question about permissions on Google Cloud Dataflow
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM