简体   繁体   English

Dask - 如何将任务分配给特定的 CPU

[英]Dask - how to assign task to the specific CPU

I'm using Dask to process research batches, which are quite heavy (from few minutes to few hours).我正在使用 Dask 来处理非常繁重的研究批次(从几分钟到几个小时)。 There's no communication between the tasks and they produce only side results.任务之间没有交流,它们只会产生副作用。 I'm using a machine which already virtualizes resources beneath it (~ 30 CPUs), so I'm just running LocalCluster .我正在使用一台已经虚拟化了它下面的资源(〜 30 个 CPU)的机器,所以我只是在运行LocalCluster Is there any way to assign a specific CPU to a task?有没有办法将特定的 CPU 分配给任务? In docs there're only examples with gpu and memory.文档中只有 gpu 和 memory 的示例。

I've tried to assign CPU in a similar way, but the tasks won't even start to process.我尝试以类似的方式分配 CPU,但任务甚至不会开始处理。

client.submit(process, d, resources={'CPU': 1}) for d in data]

I suspect this is best achieved by limiting the number of workers ( cluster.scale(3) ) and setting their process niceness我怀疑这最好通过限制工人的数量( cluster.scale(3) )并设置他们的流程友好度来实现

CPU time-sharing is really managed by the operating system CPU分时真正由操作系统管理

The likely reason that the tasks didn't start when you specified在您指定时任务未启动的可能原因

client.submit(process, d, resources={'CPU': 1}) for d in data]

is that the cluster was initiated without specifying that each worker had that resource (this has to be done at the time workers are started).是集群是在没有指定每个工作人员拥有该资源的情况下启动的(这必须在工作人员启动时完成)。 Here's how to make sure that workers have that resource:以下是确保工人拥有该资源的方法:

from dask.distributed import Client, LocalCluster
cluster = LocalCluster(resources={'CPU': 1})
client = Client(cluster)

For finer-grained control, it is possible to assign tasks to specific workers.对于更细粒度的控制,可以将任务分配给特定的工作人员。 First, get the addresses of each worker with首先,获取每个工人的地址

list_workers = list(client.scheduler_info()['workers'])

Then specify which worker(s) can complete the task:然后指定哪些工作人员可以完成任务:

# submit for completion only by the first worker in the list
results_specific_worker = [client.submit(process, d, workers=list_workers[0]) for d in data]

# submit for completion by the first two workers
results_specific_workers = [client.submit(process, d, workers=list_workers[0:2]) for d in data]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM