简体   繁体   中英

SLURM: After allocating all GPUs no more cpu job can be submitted

We have just started using slurm for managing our GPUs (currently just 2). We use ubuntu 14.04 and slurm-llnl. ​I have configured gres.conf and srun works. The problem is that if I run two jobs with --gres=gpu:1 then the two GPUs are successfully allocated and the jobs start running; now I expect to be able to run more jobs (in addition to the 2 GPU jobs) without --gres=gpu:1 (ie jobs than only use CPU and ram) but it is not possible.

The error message says that it could not allocate required resources (even though there are 24 CPU cores).

This is my gres.conf:

Name=gpu Type=titanx File=/dev/nvidia0
Name=gpu Type=titanx File=/dev/nvidia1
NodeName=ubuntu Name=gpu Type=titanx File=/dev/nvidia[0-1]

I appreciate any help. Thank you.

Make sure that SelectType in your configuration is CR_CPU or CR_Core and that the shared option of the partition is not set to exclusive . Otherwise Slurm allocates full nodes to jobs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM