When I submit a SLURM job with the option --gres=gpu:1 to a node with two GPUs, how can I get the ID of the GPU which is allocated for the job? Is there an environment variable for this purpose? The GPUs I'm using are all nvidia GPUs. Thanks.
You can get the GPU id with the environment variable CUDA_VISIBLE_DEVICES
. This variable is a comma separated list of the GPU ids assigned to the job.
Slurm stores this information in an environment variable, SLURM_JOB_GPUS
.
One way to keep track of such information is to log all SLURM related variables when running a job, for example (following Kaldi 's slurm.pl , which is a great script to wrap Slurm jobs) by including the following command within the script run by sbatch
:
set | grep SLURM | while read line; do echo "# $line"; done
You can check the environment variables SLURM_STEP_GPUS
or SLURM_JOB_GPUS
for a given node:
echo ${SLURM_STEP_GPUS:-$SLURM_JOB_GPUS}
Note CUDA_VISIBLE_DEVICES
may not correspond to the real value ( see @isarandi's comment ).
Also, note this should work for non-Nvidia GPUs as well.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.