简体   繁体   English

在谷歌数据流管道中构建容器

[英]Building containers within google dataflow pipeline

tl;dr Apache Beam pipeline step involes building docker image; tl;博士Apache 梁管道步骤涉及建筑物 docker 图片; How to run this pipeline using Google Dataflow?如何使用 Google Dataflow 运行此管道? What alternatives exist?存在哪些替代方案?

I'm currently trying make my first steps with google's dataflow service and apache beam (python).我目前正在尝试使用 google 的数据流服务和 apache beam(python)迈出我的第一步。

Trivial examples are pretty straight forward but things get confusing to me as soon as external software dependencies come into play.简单的示例非常简单,但一旦外部软件依赖性发挥作用,事情就会让我感到困惑。 It seems to be possible to use custom docker containers to setup ones own environment [1][2].似乎可以使用自定义 docker 容器来设置自己的环境 [1][2]。 While that's great for most dependencies, it doesn't help, if the dependency is docker itself, as it happens to be the case for me: One step of my pipeline involves using an external project which makes heavy use of docker (ie building images, running them)虽然这对大多数依赖项都很好,但如果依赖项是 docker 本身,它就无济于事,因为它恰好是我的情况:我的管道的一个步骤涉及使用大量使用 docker 的外部项目(即构建图像,运行它们)

As far as I can tell there are three options to tackle that problem:据我所知,有三种选择可以解决这个问题:

  1. Docker within Docker Run the external project's scripts which build docker images within a docker container running on a dataflow worker node. Docker 在 Docker 内运行外部项目的脚本,在数据流工作节点上运行的 docker 容器内构建 docker 图像。 While building docker image within docker is possible in principle [3] I've got the feeling that won't work in this case, since there is only very limited control over the environment.虽然原则上可以在 docker 内构建 docker 图像 [3],但我感觉在这种情况下行不通,因为对环境的控制非常有限。
  2. Custom VM image for worker nodes Is it possible to use custom vm images for dataflow worker nodes?工作节点的自定义虚拟机映像是否可以为数据流工作节点使用自定义虚拟机映像?
  3. Don't use Google Dataflow What are better suited alternative services?不要使用 Google Dataflow有哪些更适合的替代服务?

Thanks!谢谢!

[1] Custom VM images for Google Cloud Dataflow workers [1] 为 Google Cloud Dataflow worker 定制的 VM 镜像

[2] https://cloud.google.com/dataflow/docs/guides/using-custom-containers [2] https://cloud.google.com/dataflow/docs/guides/using-custom-containers

[3] https://www.docker.com/blog/docker-can-now-run-within-docker/ [3] https://www.docker.com/blog/docker-can-now-run-within-docker/

Edit: Added line breaks.编辑:添加换行符。

Custom VM image for worker nodes Is it possible to use custom vm images for dataflow worker nodes?工作节点的自定义虚拟机映像是否可以为数据流工作节点使用自定义虚拟机映像?

It's not possible to completely replace the Dataflow worker.不可能完全取代 Dataflow worker。 But you can use a custom Beam SDK Docker container as you noted.但是您可以使用自定义的 Beam SDK Docker 容器,如您所述。 This will result in a Docker in Docker type execution for your case.这将导致您的案例执行 Docker in Docker 类型。

Don't use Google Dataflow What are better suited alternative services?不要使用 Google Dataflow 有哪些更适合的替代服务?

Please see here for other Beam runners and their capabilities.在此处查看其他 Beam 运行器及其功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Google DataFlow 更新现有管道 - Google DataFlow Updating an existing pipeline 在谷歌云数据流中使用 experiments=no_use_multiple_sdk_containers - Use Of experiments=no_use_multiple_sdk_containers in Google cloud dataflow 在 Beam/Google Cloud Dataflow 上调试慢速 PyTorch GPU 推理管道 - Debugging a slow PyTorch GPU Inference Pipeline on Beam/Google Cloud Dataflow 谷歌数据流 - 澄清有界数据流管道的定价 - Google Dataflow - clarification around pricing for streaming pipeline with bounded data Google Cloud Dataflow - 在管道选项中提供 sdk_location - Google Cloud Dataflow - providing an sdk_location in pipeline options 如何在特定文件集从谷歌云到达云存储时启动云数据流管道 function - how to launch a cloud dataflow pipeline when particular set of files reaches Cloud storage from a google cloud function Dataflow into Beam Pipeline 中的附加参数 - The Additional Paramates at Dataflow into Beam Pipeline 数据流管道“与服务失去联系” - Dataflow pipeline "lost contact with the service" 我可以配置 Google DataFlow 以在排空管道时保持节点正常运行吗 - Can I configure Google DataFlow to keep nodes up when I drain a pipeline 解决数据流管道中数据库连接的瓶颈 - Resolving bottleneck on database connection in Dataflow pipeline
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM