简体   繁体   English

如何在pytorch中用多个GPU训练model?

[英]How to train model with multiple GPUs in pytorch?

My server has two GPUs, How can I use two GPUs for training at the same time to maximize their computing power?我的服务器有两个 GPU,如何同时使用两个 GPU 进行训练以最大化它们的计算能力? Is my code below correct?我下面的代码正确吗? Does it allow my model to be properly trained?它是否允许我的 model 得到适当的培训?

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.bert = pretrained_model
        # for param in self.bert.parameters():
        #     param.requires_grad = True
        self.linear = nn.Linear(2048, 4)


    #def forward(self, input_ids, token_type_ids, attention_mask):
    def forward(self, input_ids, attention_mask):
        batch = input_ids.size(0)
        #output = self.bert(input_ids, token_type_ids, attention_mask).pooler_output
        output = self.bert(input_ids, attention_mask).last_hidden_state
        print('last_hidden_state',output.shape) # torch.Size([1, 768]) 
        #output = output.view(batch, -1) #
        output = output[:,-1,:]#(batch_size, hidden_size*2)(batch_size,1024)
        output = self.linear(output)
        return output

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
if torch.cuda.device_count() > 1:
    print("Use", torch.cuda.device_count(), 'gpus')
    model = MyModel()
    model = nn.DataParallel(model)
    model = model.to(device)

There are two different ways to train on multiple GPUs:在多个 GPU 上训练有两种不同的方法:

  1. Data Parallelism = splitting a large batch that can't fit into a single GPU memory into multiple GPUs, so every GPU will process a small batch that can fit into its GPU Data Parallelism = splitting a large batch that can't fit into a single GPU memory into multiple GPUs, so every GPU will process a small batch that can fit into its GPU
  2. Model Parallelism = splitting the layers within the model into different devices is a bit tricky to manage and deal with. Model 并行 = 将 model 中的层拆分到不同的设备中管理和处理有点棘手。

Please refer to this post for more information 请参阅此帖子以获取更多信息

To do Data Parallelism in pure PyTorch, please refer to this example that I created a while back to the latest changes of PyTorch (as of today, 1.12).要在纯 PyTorch 中进行数据并行化,请参考我创建的这个示例,该示例回溯到 PyTorch 的最新更改(截至今天,1.12)。

To utilize other libraries to do multi-GPU training without engineering many things, I would suggest using PyTorch Lightning as it has a straightforward API and good documentation to learn how to do multi-GPU training using Data Parallelism.要利用其他库进行多 GPU 训练而无需设计很多东西,我建议使用PyTorch Lightning ,因为它有一个简单的 API 和良好的文档来学习如何使用数据并行进行多 GPU 训练。

I use data parallelism.我使用数据并行。 I refer to this link.我参考这个链接。 This is a useful reference.https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html这是一个有用的参考。https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM