[英]Building containers within google dataflow pipeline
tl;dr Apache Beam pipeline step involes building docker image; tl;博士Apache 梁管道步骤涉及建筑物 docker 图片; How to run this pipeline using Google Dataflow?如何使用 Google Dataflow 运行此管道? What alternatives exist?存在哪些替代方案?
I'm currently trying make my first steps with google's dataflow service and apache beam (python).我目前正在尝试使用 google 的数据流服务和 apache beam(python)迈出我的第一步。
Trivial examples are pretty straight forward but things get confusing to me as soon as external software dependencies come into play.简单的示例非常简单,但一旦外部软件依赖性发挥作用,事情就会让我感到困惑。 It seems to be possible to use custom docker containers to setup ones own environment [1][2].似乎可以使用自定义 docker 容器来设置自己的环境 [1][2]。 While that's great for most dependencies, it doesn't help, if the dependency is docker itself, as it happens to be the case for me: One step of my pipeline involves using an external project which makes heavy use of docker (ie building images, running them)虽然这对大多数依赖项都很好,但如果依赖项是 docker 本身,它就无济于事,因为它恰好是我的情况:我的管道的一个步骤涉及使用大量使用 docker 的外部项目(即构建图像,运行它们)
As far as I can tell there are three options to tackle that problem:据我所知,有三种选择可以解决这个问题:
Thanks!谢谢!
[1] Custom VM images for Google Cloud Dataflow workers [1] 为 Google Cloud Dataflow worker 定制的 VM 镜像
[2] https://cloud.google.com/dataflow/docs/guides/using-custom-containers [2] https://cloud.google.com/dataflow/docs/guides/using-custom-containers
[3] https://www.docker.com/blog/docker-can-now-run-within-docker/ [3] https://www.docker.com/blog/docker-can-now-run-within-docker/
Edit: Added line breaks.编辑:添加换行符。
Custom VM image for worker nodes Is it possible to use custom vm images for dataflow worker nodes?工作节点的自定义虚拟机映像是否可以为数据流工作节点使用自定义虚拟机映像?
It's not possible to completely replace the Dataflow worker.不可能完全取代 Dataflow worker。 But you can use a custom Beam SDK Docker container as you noted.但是您可以使用自定义的 Beam SDK Docker 容器,如您所述。 This will result in a Docker in Docker type execution for your case.这将导致您的案例执行 Docker in Docker 类型。
Don't use Google Dataflow What are better suited alternative services?不要使用 Google Dataflow 有哪些更适合的替代服务?
Please see here for other Beam runners and their capabilities.请在此处查看其他 Beam 运行器及其功能。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.