简体   繁体   English

PyTorch GPU 内存不足

[英]PyTorch GPU out of memory

I am running an evaluation script in PyTorch.我正在 PyTorch 中运行评估脚本。 I have a number of trained models (*.pt files), which I load and move to the GPU, taking in total 270MB of GPU memory.我有许多经过训练的模型(*.pt 文件),我将它们加载并移动到 GPU,总共占用了 270MB 的 GPU 内存。 I am using a batch size of 1. For every sample, I load a single image and also move it to the GPU.我使用的批量大小为 1。对于每个样本,我加载单个图像并将其移动到 GPU。 Then, depending on the sample, I need to run a sequence of these trained models.然后,根据样本,我需要运行一系列这些经过训练的模型。 Some models have a tensor as input and as output.一些模型有一个张量作为输入和输出。 Other models have a tensor as input, but a string as output.其他模型有一个张量作为输入,但一个字符串作为输出。 The final model in a sequence always has a string as output.序列中的最终模型总是有一个字符串作为输出。 The intermediary tensors are temporarily stored in a dictionary.中间张量临时存储在字典中。 When a model has consumed a tensor input, it is deleted using del .当模型消耗了张量输入时,使用del其删除。 Still, I notice that after every sample, the GPU memory keeps increasing until the entire memory is full.不过,我注意到每次采样后,GPU 内存都会不断增加,直到整个内存已满。

Below is some pseudocode to give you a better idea of what is going on:下面是一些伪代码,可以让您更好地了解正在发生的事情:

with torch.no_grad():
    trained_models = load_models_from_pt() # Loaded and moved to GPU, taking 270MB
    model = Model(trained_models) # Keeps the trained_models in a dictionary by name
    for sample in data_loader:
        # A sample contains a single image and is moved to the GPU
        # A sample also has some other information, but no other tensors
        model.forward(sample)

class Model(nn.Module)
    def __init__(self, trained_models):
        self.trained_models = trained_models
        self.intermediary = {}

    def forward(sample):
        for i, elem in enumerate(sample['sequence']):
             name = elem['name']
             in = elem['input']
             if name == 'a':
                model = self.trained_models['a']
                out = model(self.intermediary[in])
                del self.intermediary[in]
                self.intermediary[i] = out
             elif name == 'b':
                model self.trained_models['b']
                out = model(self.intermediary[in])
                del self.intermediary[in]
                self.intermediary[i] = out
             elif ...

I have no idea why the GPU is out of memory.我不知道为什么 GPU 内存不足。 Any ideas?有任何想法吗?

尝试在 del 之后添加 torch.cuda.empty_cache()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM