簡體 English 中英

2個GPU的NVIDIA主機上，SLURM下srun命令兩個遠程用戶如何各用一個gpu

[英]On an NVIDIA host with 2 GPUs, how can two remote users use one gpu each by srun command under SLURM

原文 2022-11-22 20:45:22 6 1 gpu/ nvidia/ slurm

我有一台帶 2 個 GPU 的 NVIDIA 主機，有兩個不同的遠程用戶需要在該主機上使用 GPU。 當每個人都通過 SLURM 管理的 srun 執行任務時，其中一個會立即釋放 GPU 資源，但另一個會留在隊列中等待資源。 但是有兩個GPU。 為什么不是每個人都得到一個 GPU？ 我已經嘗試了幾種選擇，它們在參數中，但似乎在使用 SRUN 時，以交互形式，設法執行他的工作的人擁有機器的整個域，直到他完成他的工作。

1 個解決方案

假設 Slurm 已正確配置為允許節點共享（ SelectType 選項），並將 GPU 作為通用資源進行管理（ GresType 選項），您可以使用scontrol show node並比較AllocTRES和CfgTRES輸出。

這將顯示可用的資源並找出作業 2 掛起的原因。 也許作業 1 使用了參數--exclusive ？ 也許作業 1 請求所有 CPU 或所有 memory？ 也許作業 1 請求了所有 GPU？ 等等

Slurm 中的 GPU 分配：--gres 與 --gpus-per-task，以及 mpirun 與 srun

[英]GPU allocation in Slurm: --gres vs --gpus-per-task, and mpirun vs srun

如何為每個任務設置 1 gpu 的 slurm/salloc 但讓作業使用多個 gpu？

[英]How to set slurm/salloc for 1 gpu per task but let job use multiple gpus?

如何在多個 GPU 節點上獲取分配給 SLURM 作業的 GPU ID？

[英]How to get the ID of GPU allocated to a SLURM job on a multiple GPUs node?

如何在 Azure ML 和 NVIDIA CUDA 自定義 docker 基礎映像上使用 GPU？

[英]How can I use GPUs on Azure ML with a NVIDIA CUDA custom docker base image?

如何正確 label 並配置 Kubernetes 以使用 Nvidia GPU？

[英]How to properly label and configure Kubernetes to use Nvidia GPUs?

如何在Nvidia GPU上調試OpenCL？

[英]How to debug OpenCL on Nvidia GPUs?

NVIDIA GPU 中的 if 語句是如何執行的？

[英]How is if statement executed in NVIDIA GPUs?

Slurm 作業不能為多個節點請求 GPU 資源

[英]A Slurm job can't request GPUs resources for more than one node

鑒於有多個 GPU 可用，如何在 TF2 中使用專用 GPU？

[英]How to use dedicated GPU with TF2, given that multiple GPUs are available?

如何使用Slurm訪問群集中不同節點上的GPU？

[英]How to access to GPUs on different nodes in a cluster with Slurm?

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Slurm 中的 GPU 分配：--gres 與 --gpus-per-task，以及 mpirun 與 srun 如何為每個任務設置 1 gpu 的 slurm/salloc 但讓作業使用多個 gpu？如何在多個 GPU 節點上獲取分配給 SLURM 作業的 GPU ID？如何在 Azure ML 和 NVIDIA CUDA 自定義 docker 基礎映像上使用 GPU？如何正確 label 並配置 Kubernetes 以使用 Nvidia GPU？如何在Nvidia GPU上調試OpenCL？ NVIDIA GPU 中的 if 語句是如何執行的？ Slurm 作業不能為多個節點請求 GPU 資源鑒於有多個 GPU 可用，如何在 TF2 中使用專用 GPU？如何使用Slurm訪問群集中不同節點上的GPU？

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM