简体   繁体   中英

Building containers within google dataflow pipeline

tl;dr Apache Beam pipeline step involes building docker image; How to run this pipeline using Google Dataflow? What alternatives exist?

I'm currently trying make my first steps with google's dataflow service and apache beam (python).

Trivial examples are pretty straight forward but things get confusing to me as soon as external software dependencies come into play. It seems to be possible to use custom docker containers to setup ones own environment [1][2]. While that's great for most dependencies, it doesn't help, if the dependency is docker itself, as it happens to be the case for me: One step of my pipeline involves using an external project which makes heavy use of docker (ie building images, running them)

As far as I can tell there are three options to tackle that problem:

  1. Docker within Docker Run the external project's scripts which build docker images within a docker container running on a dataflow worker node. While building docker image within docker is possible in principle [3] I've got the feeling that won't work in this case, since there is only very limited control over the environment.
  2. Custom VM image for worker nodes Is it possible to use custom vm images for dataflow worker nodes?
  3. Don't use Google Dataflow What are better suited alternative services?

Thanks!

[1] Custom VM images for Google Cloud Dataflow workers

[2] https://cloud.google.com/dataflow/docs/guides/using-custom-containers

[3] https://www.docker.com/blog/docker-can-now-run-within-docker/

Edit: Added line breaks.

Custom VM image for worker nodes Is it possible to use custom vm images for dataflow worker nodes?

It's not possible to completely replace the Dataflow worker. But you can use a custom Beam SDK Docker container as you noted. This will result in a Docker in Docker type execution for your case.

Don't use Google Dataflow What are better suited alternative services?

Please see here for other Beam runners and their capabilities.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM