多個 CPU、GPU 上的 Python 多處理

Question

我有 8 個 GPU，64 個 CPU 內核（multiprocessing.cpu_count()=64）

我正在嘗試使用深度學習模型推斷多個視頻文件。 我希望在 8 個 GPU 中的每一個上處理一些文件。 對於每個 GPU，我想要使用不同的 6 個 CPU 內核。

在python文件名下面： inference_{gpu_id}.py

Input1: GPU_id

Input2: Files to process for GPU_id

from torch.multiprocessing import Pool, Process, set_start_method
try:
     set_start_method('spawn', force=True)
except RuntimeError:
    pass

model = load_model(device='cuda:' + gpu_id) 

def pooling_func(file):
    preds = []
    cap = cv2.VideoCapture(file)
    while(cap.isOpened()):
          ret, frame = cap.read()
          count += 1
          if ret == True:
                frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
                pred = model(frame)[0]
                preds.append(pred)
          else:
                break
    cap.release()
    np.save(file[:-4]+'.npy', preds)

def process_files():

    # all files to process on gpu_id
    files = np.load(gpu_id + '_files.npy') 

    # I am hoping to use 6 cores for this gpu_id, 
    # and a different 6 cores for a different GPU id
    pool = Pool(6) 

    r = list(tqdm(pool.imap(pooling_func, files), total = len(files)))
    pool.close()
    pool.join()

if __name__ == '__main__':
    import multiprocessing
    multiprocessing.freeze_support()
    process_files()

我希望在所有 GPU 上同時運行 inference_{gpu_id}.py 文件

目前，我能夠在一個 GPU、6 核上成功運行它，但是當我嘗試在所有 GPU 上一起運行它時，只有 GPU 0 運行，所有其他人停止給出以下錯誤消息。

RuntimeError: CUDA error: invalid device ordinal.

我正在運行的腳本：

CUDA_VISIBLE_DEVICES=0 inference_0.py

CUDA_VISIBLE_DEVICES=1 inference_1.py

...

CUDA_VISIBLE_DEVICES=7 inference_7.py

Answer 1

以下原本是對你提的一個問題的回答，后來刪除了。

考慮到這一點，如果您沒有使用CUDA_VISIBLE_DEVICES標志，那么所有 GPU 都將可用於您的 PyTorch 進程。 這意味着torch.cuda.device_count將返回8 （假設您的版本設置有效）。 您將能夠通過torch.device 、 torch.device('cuda:0') 、 torch.device('cuda:1') 、 ... 和torch.device('cuda:8')訪問這 8 個 GPU 中的每一個torch.device('cuda:8') 。

現在，如果您只計划使用一種並且希望將您的過程限制為一種。 然后CUDA_VISIBLE_DEVICES=i （其中i是設備序號）將使其如此。 在這種情況下， torch.cuda只能通過torch.device('cuda:0')訪問單個設備。 實際設備序號是什么並不重要，您訪問它的方式是通過torch.device('cuda:0') 。

如果您允許訪問多個設備：比如 n°0、n°4 和 n°2，那么您將使用CUDA_VISIBLE_DEVICES=0,4,2 。 因此，您可以通過d0 = torch.device('cuda:0') 、 d1 = torch.device('cuda:1')和d2 = torch.device('cuda:2')引用您的 cuda 設備。 與您使用標志定義它們的順序相同，即：

d0 -> GPU n°0、 d1 -> GPU n°4 和d2 -> GPU n°2。

這使得您可以使用相同的代碼並在不同的 GPU 上運行它，而無需更改您引用設備序號的底層代碼。

總之，您需要查看的是運行代碼所需的設備數量。 在您的情況下： 1就足夠了。 您將使用torch.device('cuda:0')來引用它。 但是，在運行代碼時，您需要使用以下標志指定cuda:0設備是什么：

> CUDA_VISIBLE_DEVICES=0 inference.py
> CUDA_VISIBLE_DEVICES=1 inference.py
  ...
> CUDA_VISIBLE_DEVICES=7 inference.py

請注意'cuda'將默認為'cuda:0' 。

多個 CPU、GPU 上的 Python 多處理

問題描述

1 個解決方案

解決方案1
1 已采納 2021-07-25 14:18:13

多個 CPU、GPU 上的 Python 多處理

問題描述

1 個解決方案

解決方案1 1 已采納 2021-07-25 14:18:13

解決方案1
1 已采納 2021-07-25 14:18:13