SLURM：分配所有 GPU 后，无法再提交 CPU 作业

Question

We have just started using slurm for managing our GPUs (currently just 2).我们刚刚开始使用 slurm 来管理我们的 GPU（目前只有 2 个）。 We use ubuntu 14.04 and slurm-llnl.我们使用 ubuntu 14.04 和 slurm-llnl。 I have configured gres.conf and srun works.我已经配置了 gres.conf 和srun工作。 The problem is that if I run two jobs with --gres=gpu:1 then the two GPUs are successfully allocated and the jobs start running;问题是，如果我使用--gres=gpu:1运行两个作业，那么两个 GPU 将成功分配并且作业开始运行； now I expect to be able to run more jobs (in addition to the 2 GPU jobs) without --gres=gpu:1 (ie jobs than only use CPU and ram) but it is not possible.现在我希望能够在没有--gres=gpu:1情况下运行更多的工作（除了 2 个 GPU 工作）（即工作而不是只使用 CPU 和内存），但这是不可能的。

The error message says that it could not allocate required resources (even though there are 24 CPU cores).错误消息说它无法分配所需的资源（即使有 24 个 CPU 内核）。

This is my gres.conf:这是我的 gres.conf：

Name=gpu Type=titanx File=/dev/nvidia0
Name=gpu Type=titanx File=/dev/nvidia1
NodeName=ubuntu Name=gpu Type=titanx File=/dev/nvidia[0-1]

I appreciate any help.我很感激任何帮助。 Thank you.谢谢你。

Answer 1

Make sure that SelectType in your configuration is CR_CPU or CR_Core and that the shared option of the partition is not set to exclusive .确保SelectType在你的配置是CR_CPU或CR_Core并且shared分区的选项未设置为exclusive 。 Otherwise Slurm allocates full nodes to jobs.否则 Slurm 会将完整节点分配给作业。

SLURM：分配所有 GPU 后，无法再提交 CPU 作业

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-05-31 20:59:18

SLURM：分配所有 GPU 后，无法再提交 CPU 作业

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-05-31 20:59:18

解决方案1
1 已采纳 2016-05-31 20:59:18