[英]How to get the ID of GPU allocated to a SLURM job on a multiple GPUs node?
When I submit a SLURM job with the option --gres=gpu:1 to a node with two GPUs, how can I get the ID of the GPU which is allocated for the job?当我将带有选项 --gres=gpu:1 的 SLURM 作业提交给具有两个 GPU 的节点时,如何获取分配给该作业的 GPU 的 ID? Is there an environment variable for this purpose?
是否有用于此目的的环境变量? The GPUs I'm using are all nvidia GPUs.
我使用的 GPU 都是 nvidia GPU。 Thanks.
谢谢。
You can get the GPU id with the environment variable CUDA_VISIBLE_DEVICES
.您可以使用环境变量
CUDA_VISIBLE_DEVICES
获取 GPU id。 This variable is a comma separated list of the GPU ids assigned to the job.此变量是分配给作业的 GPU id 的逗号分隔列表。
Slurm stores this information in an environment variable, SLURM_JOB_GPUS
. Slurm 将此信息存储在环境变量
SLURM_JOB_GPUS
。
One way to keep track of such information is to log all SLURM related variables when running a job, for example (following Kaldi 's slurm.pl , which is a great script to wrap Slurm jobs) by including the following command within the script run by sbatch
:跟踪此类信息的一种方法是在运行作业时记录所有与 SLURM 相关的变量,例如(遵循Kaldi的slurm.pl ,这是一个很好的包装 Slurm 作业的脚本)通过在脚本运行中包含以下命令通过
sbatch
:
set | grep SLURM | while read line; do echo "# $line"; done
You can check the environment variables SLURM_STEP_GPUS
or SLURM_JOB_GPUS
for a given node:您可以检查给定节点的环境变量
SLURM_STEP_GPUS
或SLURM_JOB_GPUS
:
echo ${SLURM_STEP_GPUS:-$SLURM_JOB_GPUS}
Note CUDA_VISIBLE_DEVICES
may not correspond to the real value ( see @isarandi's comment ).注意
CUDA_VISIBLE_DEVICES
可能与实际值不对应( 请参阅@isarandi 的评论)。
Also, note this should work for non-Nvidia GPUs as well.另外,请注意这也适用于非 Nvidia GPU。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.