简体   繁体   中英

How to get multi GPUs same type on slurm?

How can I create a job with a multi GPU of the same type but not specific that type directly? My experiment has a constraint that all GPUs have the same type but this type can be whatever we want.

Currently I am able only to create a experiment with multi GPUs with telling exactly what type I want:

--gres=gpu:gres_type:amount

If I don't specify gres_type , then sometimes I get mixed GPUs packs (let say 2x titan V and 2x titan X).

If you are fortunate enough that the cluster is consistent in the types of nodes that host the GPUs, and that the features of the nodes a properly specified and allow distinguishing between the nodes that host the different GPU types, you can use the --constraint parameter.

For the sake of the argument, let's assume that the nodes that host the titanV have haswell CPUs, and those that host the titanX have skylake CPUs and that those are defined as features. Then, you can request

--gres=gpu:2
--constraint=[haswell|skylake]

If the above does not apply to your use case, you can submit two jobs and keep only the one that starts the earliest. For that, give your jobs an identical name, and use the singleton dependency.

Write a submission script like this one

#!/bin/bash
#SBATCH --dependency=singleton 
#SBATCH --job-name=gpujob
# Other options

scancel --state=PENDING --jobname=gpujob

# etc.

and submit it twice with

$ sbatch --gres=gpu:titanX:2 submit.sh
$ sbatch --gres=gpu:titanV:2 submit.sh

Each job will be assigned only one type of GPU, and the first one that starts will cancel the other one. This approach can scale up with more than two GPU types.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM