Dask - LocalCluster 的灵活 memory 分配

Question

I've run into some memory problems while using dask's LocalCluster .我在使用 dask 的LocalCluster时遇到了一些 memory 问题。 I'm working on a machine with 32 CPUs, but I have only 64GB RAM available.我正在使用具有 32 个 CPU 的机器，但我只有 64GB RAM 可用。 I'm instantiating the cluster like that:我正在像这样实例化集群：

cluster = LocalCluster(
    n_workers=os.cpu_count(),
    threads_per_worker=1
)

Dask, by default, assigns equal amount of memory per worker (total RAM divided by number of workers).默认情况下，Dask 为每个工作人员分配等量的 memory（总 RAM 除以工作人员数量）。 I'm using dask to compute research batches.我正在使用 dask 来计算研究批次。 Those batches differ in their need for memory.这些批次对 memory 的需求不同。 There aren't any problems when I process 32 smaller batches, as they fit into the memory.当我处理 32 个较小的批次时没有任何问题，因为它们适合 memory。 My problem comes, when I move into bigger batches, which can't fit into 2GB of assigned available RAM.我的问题来了，当我进入更大的批次时，它无法容纳 2GB 的已分配可用 RAM。 Then dask is raising memory allocation errors.然后 dask 引发 memory 分配错误。 I've seen that I can increase worker's timeout, but it's not very elegant solution.我已经看到我可以增加工人的超时时间，但这不是很优雅的解决方案。 Is there any way to tell dask to keep the scheduled task in the queue, unless resources are available?有没有办法告诉 dask 将计划任务保留在队列中，除非资源可用？ What'd be the correct way to handle these tasks in the queue while using LocalCluster ?使用LocalCluster时处理队列中这些任务的正确方法是什么？

Answer 1

One option is to explicitly specify resource requirements for tasks (if you know that in advance), there is a related answer here and in the documentation .一种选择是明确指定任务的资源要求（如果您事先知道的话），此处和文档中有相关答案。

The cluster would be initiated with resources={'mem': 2000} option for workers, and then the expected resource use would be stated when executing the tasks with .compute() or .submit() , eg small tasks could specify client.submit(my_func, small_task, resources={'mem': 1000}) (this will execute at most 2 tasks on the worker), while large tasks would specify client.submit(my_func, large_task, resources={'mem': 2000}) (this will execute at most 1 task on the worker).集群将使用resources={'mem': 2000}工作人员选项启动，然后在使用.compute()或.submit()执行任务时说明预期的资源使用情况，例如小任务可以指定client.submit(my_func, small_task, resources={'mem': 1000}) （这将在 worker 上执行最多 2 个任务），而大型任务将指定client.submit(my_func, large_task, resources={'mem': 2000}) （这将在工作人员上执行最多 1 个任务）。

Dask - LocalCluster 的灵活 memory 分配

问题描述

1 个解决方案

解决方案1
0 2021-03-02 13:21:32

Dask - LocalCluster 的灵活 memory 分配

问题描述

1 个解决方案

解决方案1 0 2021-03-02 13:21:32

解决方案1
0 2021-03-02 13:21:32