在 HPC 集群上使用 dask 分配大量作业的策略

Question

I have a rather complex python algorithm I need to distribute across a HPC cluster.我有一个相当复杂的 python 算法，我需要在 HPC 集群中分布。

The code is run from a Jupyterhub instance with 60 gb memory. The configuration of the PBS cluster is 1 process, 1 core, 30Gb per worker, nanny=False (the computations won't run otherwise) for a total of 26 workers (the total memory is about 726GB)代码从 60 gb memory 的 Jupyterhub 实例运行。PBS 集群的配置是 1 个进程，1 个核心，每个工作人员 30Gb，nanny=False（否则计算不会运行）总共 26 个工作人员（总共 memory 大约是 726GB）

I do not need to fetch back any data, since what is needed is written to disk right at the end of the computations.我不需要取回任何数据，因为所需的数据会在计算结束时立即写入磁盘。 Note that each computations takes about 7 minutes when run independantly.请注意，独立运行时，每个计算大约需要 7 分钟。

The problem I run into is the following: each independant worker (Jobname: dask-worker) seems to run fine, it has about 20Gb available of which max 5Gb is used.我遇到的问题如下：每个独立工作者（工作名称：dask-worker）似乎运行良好，它有大约 20Gb 可用，其中最多使用 5Gb。 But whenever I try to launch more than about 50 jobs, then the central worker (Jobname: jupyterhub) runs out of memory after about 20 minutes.但是每当我尝试启动超过 50 个工作时，中央工作人员（工作名称：jupyterhub）在大约 20 分钟后用完 memory。

Here is how I distribute the computations:这是我分配计算的方式：

def complex_python_func(params):
    return compute(params=params).run()

Then I have tried to use client.map or delayed as such:然后我尝试使用 client.map 或像这样延迟：

list_of_params = [1, 2, 3, 4, 5, ... n] # with n > 256

# With delayed
lazy = [dask.delayed(complex_python_func)(l) for l in list_of_params]
futures = client.compute(lazy)
# Or with map
chain = client.map(complex_python_func, list_of_params)

Here is the configuration of the cluster:这是集群的配置：

cluster = PBSCluster(
    cores=1,
    memory="30GB",
    interface="ib0",
    queue=queue,
    processes=1,
    nanny=False,
    walltime="12:00:00",
    shebang="#!/bin/bash",
    env_extra=env_extra,
    python=python_bin,
)
cluster.scale(32)

I can't understand why it does not work.我不明白为什么它不起作用。 I would expect Dask to run each computation then release memory (every about 6/7 minutes for each individual task).我希望 Dask 运行每个计算然后发布 memory（每个任务大约每 6/7 分钟）。 I check the memory usage of the worker with qstat -f jobId and it keeps increasing until the worker is killed.我用 qstat -f jobId 检查了工人的 memory 使用情况，它一直在增加，直到工人被杀死。

What is causing the jupyterhub worker to fail and what would be the good (or at least a better) way of achieving this?是什么导致 jupyterhub worker 失败以及实现这一目标的好方法（或至少是更好的方法）是什么？

Answer 1

Two potential leads are:两个潜在的线索是：

If the workers are not expected to return anything, then it might be worth changing the return statement to return None (it's not clear what compute() does in your script):如果不希望工作人员返回任何东西，那么可能值得将 return 语句更改为return None （不清楚compute()在您的脚本中做了什么）：

 def complex_python_func(params):
    return compute(params=params).run()

It's possible that dask allocates more than one job per worker and at some point the workers has more tasks than it can handle.有可能dask为每个工人分配了不止一份工作，并且在某些时候工人的任务多于它可以处理的。 One way out of this is to reduce the number of tasks that a worker can take at any given time with resources , eg using:解决此问题的一种方法是减少工作人员在任何给定时间使用resources可以执行的任务数量，例如使用：

# add resources when creating the cluster
cluster = PBSCluster(
    # all other settings are unchanged, but add this line to give each worker
    extra=['--resources foo=1'],
)

# rest of code skipped, but make sure to specify resources needed by task
# when submitting it for computation
lazy = [dask.delayed(complex_python_func)(l) for l in list_of_params]
futures = client.compute(lazy, resources={'foo': 1})
# Or with map
chain = client.map(complex_python_func, list_of_params, resources={'foo': 1})

For more information on resources, see documentation or this related question Specifying Task Resources: Fractional gpu有关资源的更多信息，请参阅文档或此相关问题Specifying Task Resources: Fractional gpu

在 HPC 集群上使用 dask 分配大量作业的策略

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-02-16 06:30:39

在 HPC 集群上使用 dask 分配大量作业的策略

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-02-16 06:30:39

解决方案1
1 已采纳 2021-02-16 06:30:39