[英]Docker design: exchange data between containers or put multiple processes in one container?
In a current project I have to perform the following tasks (among others):在当前项目中,我必须执行以下任务(除其他外):
Currently, the stitching and the streaming runs in one docker container, and the object detection runs in another, reading the panorama stream as input.目前,拼接和流式传输在一个 docker 容器中运行,对象检测在另一个容器中运行,读取全景流作为输入。
Since I need to increase the input resolution for the the object detector while maintaining the stream resolution for the UI, I have to look for alternative ways of getting the stitched (full resolution) panorama (~10 MB per frame) from the stitcher container to the detector container.由于我需要在保持 UI 的流分辨率的同时增加对象检测器的输入分辨率,因此我必须寻找其他方法来从拼接器容器获取拼接(全分辨率)全景(每帧约 10 MB)到探测器容器。
My thoughts regarding potential solutions:我对潜在解决方案的看法:
Since I'm not the sharpest knife in the docker drawer, what I'm asking for are tips, experiences and best practices regarding fast data exchange between docker containers.由于我不是 docker 抽屉里最锋利的刀,我所要求的是有关 docker 容器之间快速数据交换的技巧、经验和最佳实践。
Usually most communication between Docker containers is over network sockets.通常 Docker 容器之间的大多数通信是通过网络套接字进行的。 This is fine when you're talking to something like a relational database or an HTTP server.
当您与诸如关系数据库或 HTTP 服务器之类的东西交谈时,这很好。 It sounds like your application is a little more about sharing files, though, and that's something Docker is a little less good at.
不过,听起来您的应用程序更多的是关于共享文件,而 Docker 不太擅长这一点。
If you only want one copy of each component, or are still actively developing the pipeline: I'd probably not use Docker for this.如果您只想要每个组件的一个副本,或者仍在积极开发管道:我可能不会为此使用 Docker。 Since each container has an isolated filesystem and its own user ID space, sharing files can be unexpectedly tricky (every container must agree on numeric user IDs).
由于每个容器都有一个独立的文件系统和自己的用户 ID 空间,因此共享文件可能会出乎意料地棘手(每个容器必须就数字用户 ID 达成一致)。 But if you just run everything on the host, as the same user, pointing at the same directory, this isn't a problem.
但是如果你只是在主机上运行所有东西,作为同一个用户,指向同一个目录,这不是问题。
If you're trying to scale this in production: I'd add some sort of shared filesystem and a message queueing system like RabbitMQ.如果你想在生产中扩展它:我会添加某种共享文件系统和一个消息队列系统,比如 RabbitMQ。 For local work this could be a Docker named volume or bind-mounted host directory;
对于本地工作,这可能是一个 Docker 命名的卷或绑定安装的主机目录; cloud storage like Amazon S3 will work fine too.
像 Amazon S3 这样的云存储也可以正常工作。 The setup is like this:
设置是这样的:
In this setup each component is totally stateless.在这个设置中,每个组件都是完全无状态的。 If you discover that, for example, the machine-learning component of this is slowest, you can run duplicate copies of it.
例如,如果您发现它的机器学习组件最慢,您可以运行它的重复副本。 If something breaks, RabbitMQ will remember that a given message hasn't been fully processed (acknowledged);
如果出现问题,RabbitMQ 会记住给定的消息还没有被完全处理(确认); and again because of the isolation you can run that specific component locally to reproduce and fix the issue.
再次由于隔离,您可以在本地运行该特定组件以重现和修复问题。
This model also translates well to larger-scale Docker-based cluster-computing systems like Kubernetes.该模型还可以很好地转换为更大规模的基于 Docker 的集群计算系统,如 Kubernetes。
Running this locally, I would absolutely keep separate concerns in separate containers (especially if individual image-processing and ML tasks are expensive).在本地运行它,我绝对会在单独的容器中保留单独的关注点(特别是如果单独的图像处理和 ML 任务很昂贵)。 The setup I propose needs both a message queue (to keep track of the work) and a shared filesystem (because message queues tend to not be optimized for 10+ MB individual messages).
我建议的设置需要一个消息队列(以跟踪工作)和一个共享文件系统(因为消息队列往往不会针对 10 MB 以上的单个消息进行优化)。 You get a choice between Docker named volumes and host bind-mounts as readily available shared storage.
您可以在 Docker 命名卷和主机绑定挂载之间进行选择,作为现成的共享存储。 Bind mounts are easier to inspect and administer, but on some platforms are legendarily slow.
绑定挂载更容易检查和管理,但在某些平台上速度非常慢。 Named volumes I think are reasonably fast, but you can only access them from Docker containers, which means needing to launch more containers to do basic things like backup and pruning.
我认为命名卷相当快,但您只能从 Docker 容器访问它们,这意味着需要启动更多容器来执行备份和修剪等基本操作。
Alright, Let's unpack this:好吧,让我们解开这个:
If you really want to run multiple processes in one container, it's possible.如果你真的想在一个容器中运行多个进程,这是可能的。 There are multiple ways to achieve that, however I prefer
supervisord
.有多种方法可以实现这一点,但我更喜欢
supervisord
。
https://docs.docker.com/config/containers/multi-service_container/ https://docs.docker.com/config/containers/multi-service_container/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.