多个 CPU、GPU 上的 Python 多处理

Question

I have 8 GPUs, 64 CPU cores (multiprocessing.cpu_count()=64)我有 8 个 GPU，64 个 CPU 内核（multiprocessing.cpu_count()=64）

I am trying to get inference of multiple video files using a deep learning model.我正在尝试使用深度学习模型推断多个视频文件。 I want some files to get processed on each of the 8 GPUs.我希望在 8 个 GPU 中的每一个上处理一些文件。 For each GPU, I want a different 6 CPU cores utilized.对于每个 GPU，我想要使用不同的 6 个 CPU 内核。

Below python filename: inference_{gpu_id}.py在python文件名下面： inference_{gpu_id}.py

Input1: GPU_id

Input2: Files to process for GPU_id

from torch.multiprocessing import Pool, Process, set_start_method
try:
     set_start_method('spawn', force=True)
except RuntimeError:
    pass

model = load_model(device='cuda:' + gpu_id) 

def pooling_func(file):
    preds = []
    cap = cv2.VideoCapture(file)
    while(cap.isOpened()):
          ret, frame = cap.read()
          count += 1
          if ret == True:
                frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
                pred = model(frame)[0]
                preds.append(pred)
          else:
                break
    cap.release()
    np.save(file[:-4]+'.npy', preds)

def process_files():

    # all files to process on gpu_id
    files = np.load(gpu_id + '_files.npy') 

    # I am hoping to use 6 cores for this gpu_id, 
    # and a different 6 cores for a different GPU id
    pool = Pool(6) 

    r = list(tqdm(pool.imap(pooling_func, files), total = len(files)))
    pool.close()
    pool.join()

if __name__ == '__main__':
    import multiprocessing
    multiprocessing.freeze_support()
    process_files()

I am hoping to run inference_{gpu_id}.py files on all GPUs simultaneously我希望在所有 GPU 上同时运行 inference_{gpu_id}.py 文件

Currently, I am able to successfully run it on one GPU, 6 cores, But when I try to run it on all GPUs together, only GPU 0 runs, all others stop giving below error message.目前，我能够在一个 GPU、6 核上成功运行它，但是当我尝试在所有 GPU 上一起运行它时，只有 GPU 0 运行，所有其他人停止给出以下错误消息。

RuntimeError: CUDA error: invalid device ordinal.

The script I am running:我正在运行的脚本：

CUDA_VISIBLE_DEVICES=0 inference_0.py

CUDA_VISIBLE_DEVICES=1 inference_1.py

...

CUDA_VISIBLE_DEVICES=7 inference_7.py

Answer 1

The following is originally an answer to a question you asked but later deleted.以下原本是对你提的一个问题的回答，后来删除了。

Consider this, if you are not using the CUDA_VISIBLE_DEVICES flag, then all GPUs will be available to your PyTorch process.考虑到这一点，如果您没有使用CUDA_VISIBLE_DEVICES标志，那么所有 GPU 都将可用于您的 PyTorch 进程。 This means torch.cuda.device_count will return 8 (assuming your version setup is valid).这意味着torch.cuda.device_count将返回8 （假设您的版本设置有效）。 And you will be able to get access to each one of those 8 GPUs with torch.device , via torch.device('cuda:0') , torch.device('cuda:1') , ..., and torch.device('cuda:8') .您将能够通过torch.device 、 torch.device('cuda:0') 、 torch.device('cuda:1') 、 ... 和torch.device('cuda:8')访问这 8 个 GPU 中的每一个torch.device('cuda:8') 。

Now if you are only planning on using one and want to restrict your process to one.现在，如果您只计划使用一种并且希望将您的过程限制为一种。 then CUDA_VISIBLE_DEVICES=i (where i is the device ordinal) will make it so.然后CUDA_VISIBLE_DEVICES=i （其中i是设备序号）将使其如此。 In this case torch.cuda will only have access to a single device through torch.device('cuda:0') .在这种情况下， torch.cuda只能通过torch.device('cuda:0')访问单个设备。 It doesn't matter what the actual device ordinal is, the way you access it is through torch.device('cuda:0') .实际设备序号是什么并不重要，您访问它的方式是通过torch.device('cuda:0') 。

If you allow access to more than one device: let's say n°0, n°4, and n°2, then you would use CUDA_VISIBLE_DEVICES=0,4,2 .如果您允许访问多个设备：比如 n°0、n°4 和 n°2，那么您将使用CUDA_VISIBLE_DEVICES=0,4,2 。 Consequently you refer to your cuda devices via d0 = torch.device('cuda:0') , d1 = torch.device('cuda:1') , and d2 = torch.device('cuda:2') .因此，您可以通过d0 = torch.device('cuda:0') 、 d1 = torch.device('cuda:1')和d2 = torch.device('cuda:2')引用您的 cuda 设备。 In the same order as you defined them with the flag, ie :与您使用标志定义它们的顺序相同，即：

d0 -> GPU n°0, d1 -> GPU n°4, and d2 -> GPU n°2. d0 -> GPU n°0、 d1 -> GPU n°4 和d2 -> GPU n°2。

This makes it so you can use the same code and run it on different GPUs without having to change the underlying code where you are referring to the device ordinal.这使得您可以使用相同的代码并在不同的 GPU 上运行它，而无需更改您引用设备序号的底层代码。

In summary, what you need to look at is the number of devices you need to run your code.总之，您需要查看的是运行代码所需的设备数量。 In your case: 1 is enough.在您的情况下： 1就足够了。 You will refer to it with torch.device('cuda:0') .您将使用torch.device('cuda:0')来引用它。 When running your code, however, you will need to specify what that cuda:0 device is, with the flag:但是，在运行代码时，您需要使用以下标志指定cuda:0设备是什么：

> CUDA_VISIBLE_DEVICES=0 inference.py
> CUDA_VISIBLE_DEVICES=1 inference.py
  ...
> CUDA_VISIBLE_DEVICES=7 inference.py

Do note 'cuda' will default to 'cuda:0' .请注意'cuda'将默认为'cuda:0' 。

多个 CPU、GPU 上的 Python 多处理

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-07-25 14:18:13

多个 CPU、GPU 上的 Python 多处理

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-07-25 14:18:13

解决方案1
1 已采纳 2021-07-25 14:18:13