如何在 pytorch 中使用多个 GPU？

Question

I use this command to use a GPU.我用这个命令来使用一个GPU。

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

But, I want to use two GPUs in jupyter , like this:但是，我想在jupyter中使用两个 GPU，如下所示：

device = torch.device("cuda:0,1" if torch.cuda.is_available() else "cpu")

Answer 1

Using multi-GPUs is as simply as wrapping a model in DataParallel and increasing the batch size.使用多 GPU 就像在DataParallel包装模型并增加批量大小一样简单。 Check these two tutorials for a quick start:查看这两个教程以快速入门：

Multi-GPU Examples 多 GPU 示例
Data Parallelism 数据并行

Answer 2

Assuming that you want to distribute the data across the available GPUs (If you have batch size of 16, and 2 GPUs, you might be looking providing the 8 samples to each of the GPUs), and not really spread out the parts of models across difference GPU's.假设您想将数据分布在可用的 GPU 上（如果您的批次大小为 16 和 2 个 GPU，您可能希望为每个 GPU 提供 8 个样本），而不是真正将模型的各个部分分布在GPU的区别。 This can be done as follows:这可以按如下方式完成：

If you want to use all the available GPUs:如果您想使用所有可用的 GPU：

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = CreateModel()

model= nn.DataParallel(model)
model.to(device)

If you want to use specific GPUs: (For example, using 2 out of 4 GPUs)如果您想使用特定的 GPU：（例如，使用 4 个 GPU 中的 2 个）

device = torch.device("cuda:1,3" if torch.cuda.is_available() else "cpu") ## specify the GPU id's, GPU id's start from 0.

model = CreateModel()

model= nn.DataParallel(model,device_ids = [1, 3])
model.to(device)

To use the specific GPU's by setting OS environment variable:要通过设置操作系统环境变量来使用特定 GPU：

Before executing the program, set CUDA_VISIBLE_DEVICES variable as follows:在执行程序之前，设置CUDA_VISIBLE_DEVICES变量如下：

export CUDA_VISIBLE_DEVICES=1,3 (Assuming you want to select 2nd and 4th GPU) export CUDA_VISIBLE_DEVICES=1,3 （假设您要选择第 2 个和第 4 个 GPU）

Then, within program, you can just use DataParallel() as though you want to use all the GPUs.然后，在程序中，您可以像使用所有 GPU 一样使用DataParallel() 。 (similar to 1st case). （类似于第一种情况）。 Here the GPUs available for the program is restricted by the OS environment variable.这里可供程序使用的 GPU 受操作系统环境变量的限制。

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = CreateModel()

model= nn.DataParallel(model)
model.to(device)

In all of these cases, the data has to be mapped to the device.在所有这些情况下，数据都必须映射到设备。

If X and y are the data:如果X和y是数据：

X.to(device)
y.to(device)

Answer 3

Another option would be to use some helper libraries for PyTorch:另一种选择是为 PyTorch 使用一些帮助程序库：

PyTorch Ignite library Distributed GPU training PyTorch Ignite 库分布式 GPU 训练

In there there is a concept of context manager for distributed configuration on:其中有一个用于分布式配置的上下文管理器的概念：

nccl - torch native distributed configuration on multiple GPUs nccl - 多个 GPU 上的火炬本机分布式配置
xla-tpu - TPUs distributed configuration xla-tpu - TPU 分布式配置

PyTorch Lightning Multi-GPU training PyTorch Lightning Multi-GPU 训练

This is of possible the best option IMHO to train on CPU/GPU/TPU without changing your original PyTorch code.恕我直言，这可能是在不更改原始 PyTorch 代码的情况下在 CPU/GPU/TPU 上训练的最佳选择。

Worth cheking Catalyst for similar distributed GPU options.对于类似的分布式 GPU 选项，值得一试Catalyst 。

Answer 4

In 2022, PyTorch says: 2022 年，PyTorch 说：

It is recommended to use DistributedDataParallel, instead of this class, to do multi-GPU training, even if there is only a single node.建议使用 DistributedDataParallel，而不是这个 class，来做多 GPU 训练，即使只有一个节点。 See: Use nn.parallel.DistributedDataParallel instead of multiprocessing or nn.DataParallel and Distributed Data Parallel.请参阅：使用 nn.parallel.DistributedDataParallel 而不是多处理或 nn.DataParallel 和分布式数据并行。

in https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html#torch.nn.DataParallel在https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html#torch.nn.DataParallel

Thus, seems that we should use DistributedDataParallel , not DataParallel .因此，似乎我们应该使用DistributedDataParallel ，而不是DataParallel 。

Answer 5

When I ran naiveinception_googlenet, the above methods didn't work for me.当我运行 naiveinception_googlenet 时，上述方法对我不起作用。 The following method solved my problem.以下方法解决了我的问题。

import os导入操作系统

os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"

os.environ["CUDA_VISIBLE_DEVICES"]="0,3" # specify which GPU(s) to be used os.environ["CUDA_VISIBLE_DEVICES"]="0,3" # 指定要使用的 GPU

Answer 6

If you want to run your code only on specific GPUs (eg only on GPU id 2 and 3), then you can specify that using the CUDA_VISIBLE_DEVICES=2,3 variable when triggering the python code from terminal.如果您只想在特定 GPU 上运行您的代码（例如，仅在 GPU id 2 和 3 上），那么您可以在从终端触发 python 代码时指定使用 CUDA_VISIBLE_DEVICES=2,3 变量。

CUDA_VISIBLE_DEVICES=2,3 python lstm_demo_example.py --epochs=30 --lr=0.001

and inside the code, leave it as:在代码中，将其保留为：

device = torch.device("cuda" if torch.cuda.is_available() else 'cpu')
model = LSTMModel()
model = nn.DataParallel(model)
model = model.to(device)

Source : https://glassboxmedicine.com/2020/03/04/multi-gpu-training-in-pytorch-data-and-model-parallelism/资料来源： https ://glassboxmedicine.com/2020/03/04/multi-gpu-training-in-pytorch-data-and-model-parallelism/

如何在 pytorch 中使用多个 GPU？

问题描述

6 个解决方案

解决方案1
18 2019-01-16 12:51:18

解决方案2
13 2020-11-13 17:55:23

解决方案3
3 2020-09-18 14:37:00

PyTorch Ignite library Distributed GPU training PyTorch Ignite 库分布式 GPU 训练

PyTorch Lightning Multi-GPU training PyTorch Lightning Multi-GPU 训练

解决方案4
2 2022-11-29 05:40:46

解决方案5
0 2021-11-20 05:58:47

解决方案6
0 2022-07-19 15:00:12

如何在 pytorch 中使用多个 GPU？

问题描述

6 个解决方案

解决方案1 18 2019-01-16 12:51:18

解决方案2 13 2020-11-13 17:55:23

解决方案3 3 2020-09-18 14:37:00

PyTorch Ignite library Distributed GPU training PyTorch Ignite 库分布式 GPU 训练

PyTorch Lightning Multi-GPU training PyTorch Lightning Multi-GPU 训练

解决方案4 2 2022-11-29 05:40:46

解决方案5 0 2021-11-20 05:58:47

解决方案6 0 2022-07-19 15:00:12

解决方案1
18 2019-01-16 12:51:18

解决方案2
13 2020-11-13 17:55:23

解决方案3
3 2020-09-18 14:37:00

解决方案4
2 2022-11-29 05:40:46

解决方案5
0 2021-11-20 05:58:47

解决方案6
0 2022-07-19 15:00:12