PyTorch GPU 内存不足

Question

I am running an evaluation script in PyTorch.我正在 PyTorch 中运行评估脚本。 I have a number of trained models (*.pt files), which I load and move to the GPU, taking in total 270MB of GPU memory.我有许多经过训练的模型（*.pt 文件），我将它们加载并移动到 GPU，总共占用了 270MB 的 GPU 内存。 I am using a batch size of 1. For every sample, I load a single image and also move it to the GPU.我使用的批量大小为 1。对于每个样本，我加载单个图像并将其移动到 GPU。 Then, depending on the sample, I need to run a sequence of these trained models.然后，根据样本，我需要运行一系列这些经过训练的模型。 Some models have a tensor as input and as output.一些模型有一个张量作为输入和输出。 Other models have a tensor as input, but a string as output.其他模型有一个张量作为输入，但一个字符串作为输出。 The final model in a sequence always has a string as output.序列中的最终模型总是有一个字符串作为输出。 The intermediary tensors are temporarily stored in a dictionary.中间张量临时存储在字典中。 When a model has consumed a tensor input, it is deleted using del .当模型消耗了张量输入时，使用del其删除。 Still, I notice that after every sample, the GPU memory keeps increasing until the entire memory is full.不过，我注意到每次采样后，GPU 内存都会不断增加，直到整个内存已满。

Below is some pseudocode to give you a better idea of what is going on:下面是一些伪代码，可以让您更好地了解正在发生的事情：

with torch.no_grad():
    trained_models = load_models_from_pt() # Loaded and moved to GPU, taking 270MB
    model = Model(trained_models) # Keeps the trained_models in a dictionary by name
    for sample in data_loader:
        # A sample contains a single image and is moved to the GPU
        # A sample also has some other information, but no other tensors
        model.forward(sample)

class Model(nn.Module)
    def __init__(self, trained_models):
        self.trained_models = trained_models
        self.intermediary = {}

    def forward(sample):
        for i, elem in enumerate(sample['sequence']):
             name = elem['name']
             in = elem['input']
             if name == 'a':
                model = self.trained_models['a']
                out = model(self.intermediary[in])
                del self.intermediary[in]
                self.intermediary[i] = out
             elif name == 'b':
                model self.trained_models['b']
                out = model(self.intermediary[in])
                del self.intermediary[in]
                self.intermediary[i] = out
             elif ...

I have no idea why the GPU is out of memory.我不知道为什么 GPU 内存不足。 Any ideas?有任何想法吗？

Answer 1

尝试在 del 之后添加 torch.cuda.empty_cache()

PyTorch GPU 内存不足

问题描述

1 个解决方案

解决方案1
1 2020-09-03 15:17:59

PyTorch GPU 内存不足

问题描述

1 个解决方案

解决方案1 1 2020-09-03 15:17:59

解决方案1
1 2020-09-03 15:17:59