简体   繁体   English

Dask jobqueue - 有没有办法同时启动所有工人?

[英]Dask jobqueue - Is there a way to start all workers at the same time?

Say if I have the following deployment on SLURM:假设我在 SLURM 上有以下部署:

cluster = SLURMCluster(processes=1, cores=25, walltime=1-00:00:00)

cluster.scale(20)
client = Client(cluster)

So I will have 20 nodes each with 25 cores.所以我将有 20 个节点,每个节点有 25 个核心。

Is there a way to tell the slurm scheduler to start all nodes at the same time, instead of starting each one individually when they become available?有没有办法告诉 slurm 调度程序同时启动所有节点,而不是在它们可用时单独启动每个节点?

A specific example: when nodes are being started individually, those that started the earliest might wait for several, say 2, hours until all 20 nodes are ready.一个具体的例子:当节点单独启动时,最早启动的节点可能会等待几个小时,比如 2 小时,直到所有 20 个节点都准备就绪。 This not only is a waste of resources but also this makes my total job time to be less than 24 hour (eg 22 hours).这不仅是资源浪费,而且使我的总工作时间少于 24 小时(例如 22 小时)。

This is something one can do easily with dask_mpi , where a single batch job is allocated.这是可以使用dask_mpi轻松完成的事情,其中分配了单个批处理作业。 I am wondering if it's possible to do this with dask_jobqueue specifically.我想知道是否可以专门用dask_jobqueue来做到这一点。

dask-jobqueue itself doesn't propose such a functionality. dask-jobqueue本身并没有提出这样的功能。

It is designed to submit independent jobs.它旨在提交独立作业。 So to achieve this you would have to look at the possibilities of the job queuing system, Slurm in your case, and see if this is possible without dask-jobqueue.因此,要实现这一点,您必须查看作业排队系统的可能性,在您的情况下为 Slurm,并查看在没有 dask-jobqueue 的情况下是否可行。 Then you should try to add the correct options to dask-jobqueue if you can, though job_extra_directives kwarg for example.然后,如果可以的话,您应该尝试向dask-jobqueue添加正确的选项,例如job_extra_directives kwarg。

I'm not aware of such a functionality within Slurm, but there are so many knobs it is hard to tell.我不知道 Slurm 中有这样的功能,但是旋钮太多了,很难说。 I know this is not possible with PBS.我知道这对于 PBS 是不可能的。

A good option to achieve what you want is, as you said so, using dask-mpi .正如您所说,实现您想要的目标的一个好选择是使用dask-mpi

A final thought, you could also start your computation with the first two nodes, not waiting for the other to be ready.最后一个想法是,您还可以从前两个节点开始计算,而不是等待另一个节点准备就绪。 This should be doable in most cases.在大多数情况下这应该是可行的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM