简体   繁体   English

在 dask 中,运行本身运行 docker 容器的任务的最简单方法是什么?

[英]In dask, what is the easiest way to run a task that itself runs a docker container?

The following code maps a function over an iterable.以下代码将函数映射到可迭代对象上。 The function being applied to each element runs a docker container in order to compute its return value:应用于每个元素的函数运行一个 docker 容器以计算其返回值:

import subprocess

def task(arg):
    return subprocess.check_output(
        ["docker", "run", "ubuntu", "bash", "-c", f"echo 'result_{arg}'"]
    )

args = [1, 2, 3]
for result in map(task, args):
    print(result.decode("utf-8").strip())
result_1
result_2
result_3

What is the easiest way to parallelize this computation over cloud compute resources in dask?在 dask 中通过云计算资源并行化此计算的最简单方法是什么?

For example, it would be nice if one could do something like the following.例如,如果可以执行以下操作,那就太好了。 But this of course does not work, because the docker containers on Fargate in which the python code is executing are running the default dask image, and thus do not have the ability to spawn a docker container themselves (I am not sure whether there is or is not a solution in this "docker-in-docker" direction):但这当然不起作用,因为在其中执行 python 代码的 Fargate 上的 docker 容器正在运行默认的 dask 映像,因此没有能力自己生成 docker 容器(我不确定是否有或不是这个“docker-in-docker”方向的解决方案):

import subprocess

from dask.distributed import Client
from dask_cloudprovider import FargateCluster
import dask.bag

def task(arg):
    return subprocess.check_output(
        ["docker", "run", "ubuntu", "bash", "-c", f"echo 'result_{arg}'"]
    )

cluster = FargateCluster(n_workers=1)
client = Client(cluster)
args = [1, 2, 3]
for result in dask.bag.from_sequence(args).map(task).compute():
    print(result)

I am looking for a solution that doesn't involve housing unrelated code in the same docker image.我正在寻找一种解决方案,它不涉及在同一个 docker 映像中包含无关代码。 Ie I want the docker image used by my task for its computation to be an arbitrary third party image that I do not have to alter by adding python/dask dependencies to.即我希望我的任务用于计算的 docker 图像是一个任意的第三方图像,我不必通过添加 python/dask 依赖项来更改它。 So I think that rules out solutions based on altering the image used by a worker node under dask_cloudprovider.FargateCluster/ECSCluster , since that will have to house python/dask dependencies.所以我认为这排除了基于更改dask_cloudprovider.FargateCluster/ECSCluster下工作节点使用的图像的解决方案,因为这将不得不容纳 python/dask 依赖项。

Pulling a container onto a kubernetes node has significant overhead and can really only be justified if the task is long running (minutes, hours).将容器拉到 kubernetes 节点上会产生很大的开销,并且只有在任务长时间运行(几分钟、几小时)时才真正合理。 dask is oriented towards low overhead python based tasks. dask面向基于 Python 的低开销任务。

In my opinion, dask is not the right tool to execute tasks that are container images.在我看来, dask不是执行容器映像任务的正确工具。 There are several other technologies that better support execution of container based tasks/workflows (Airflow's KubernetesExecutor or Argo Workflows for example).还有其他几种技术可以更好地支持基于容器的任务/工作流的执行(例如 Airflow 的 KubernetesExecutor 或 Argo Workflows)。

What you might consider is using dask_kubernetes inside a container based task to spin up an ephemeral cluster for the purposes of executing the computational work required.您可能会考虑在基于容器的任务中使用dask_kubernetes来启动临时集群,以执行所需的计算工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 airflow 运行 docker 操作员任务,该任务本身在 docker 容器中运行? - How to run a docker operator task from airflow which itself runs in a docker container? 将 Docker 容器导出到 VM 的最简单方法是什么 - What is the easiest way to export a Docker container to VM Gunicorn自行运行,但不在docker容器中运行 - Gunicorn runs by itself but not in docker container 通过Jenkins CI在Docker容器中运行Selenium测试的最简单方法 - Easiest way to run Selenium tests in a Docker container over Jenkins CI 将Docker容器连接到本地主机的最简单方法 - Easiest way to connect Docker container to local host 将PuTTY连接到现有docker容器的最简单方法 - Easiest way to connect with PuTTY to an existing docker container Docker - 启动 phpunit 测试套件的最简单方法是什么 - Docker - What is the easiest way to launch phpunit testsuite Docker 容器在本地运行但在 Cloud Run 上失败 - Docker container runs locally but fails on Cloud Run 使用Docker运行单个NodeJS脚本的最简单方法是什么,并且能够使用Ctrl-C终止它 - What is the easiest way to run a single NodeJS script using Docker and be able to terminate it with Ctrl-C 在另一个 Docker 容器中运行 AirFlow 任务 - Run an AirFlow task in another Docker container
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM