简体繁体中英

On an NVIDIA host with 2 GPUs, how can two remote users use one gpu each by srun command under SLURM

原文 2022-11-22 20:45:22 1 1 gpu/ nvidia/ slurm

I have an NVIDIA host with 2 GPUs and there are two different remote users that need to use a GPU on that host. When each one executes its tasks by srun, which are managed by SLURM, for one of them the GPU resources are released immediately, but for another it stays in a queue waiting for resources. But there are two GPUs. Why doesn't everyone get a GPU? I have already tried several alternatives, they were in the parameters, but it seems that when using SRUN, in the interactive form, the person who manages to execute his job has the whole domain of the machine until he finishes his job.

1 answers

Assuming Slurm is correctly be configured to allow node sharing ( SelectType option ), and to manage GPUs as generic resources ( GresType option ), you could use scontrol show node and compare the AllocTRES and CfgTRES outputs.

This would show what resources are available and find out why job 2 is pending. Maybe job 1 used parameter --exclusive ? Maybe job 1 requested all the CPUs or all the memory? Maybe job 1 requested all GPUs? etc.

GPU allocation in Slurm: --gres vs --gpus-per-task, and mpirun vs srun

How to set slurm/salloc for 1 gpu per task but let job use multiple gpus?

How to get the ID of GPU allocated to a SLURM job on a multiple GPUs node?

How can I use GPUs on Azure ML with a NVIDIA CUDA custom docker base image?

How to properly label and configure Kubernetes to use Nvidia GPUs?

How to debug OpenCL on Nvidia GPUs?

How is if statement executed in NVIDIA GPUs?

A Slurm job can't request GPUs resources for more than one node

How to use dedicated GPU with TF2, given that multiple GPUs are available?

How to access to GPUs on different nodes in a cluster with Slurm?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question GPU allocation in Slurm: --gres vs --gpus-per-task, and mpirun vs srun How to set slurm/salloc for 1 gpu per task but let job use multiple gpus? How to get the ID of GPU allocated to a SLURM job on a multiple GPUs node? How can I use GPUs on Azure ML with a NVIDIA CUDA custom docker base image? How to properly label and configure Kubernetes to use Nvidia GPUs? How to debug OpenCL on Nvidia GPUs? How is if statement executed in NVIDIA GPUs? A Slurm job can't request GPUs resources for more than one node How to use dedicated GPU with TF2, given that multiple GPUs are available? How to access to GPUs on different nodes in a cluster with Slurm?

Related Tags

On an NVIDIA host with 2 GPUs, how can two remote users use one gpu each by srun command under SLURM

Question

1 answers

solution1 0 2022-11-23 13:44:04

solution1
0 2022-11-23 13:44:04