[英]PyTorch GPU out of memory
I am running an evaluation script in PyTorch.我正在 PyTorch 中运行评估脚本。 I have a number of trained models (*.pt files), which I load and move to the GPU, taking in total 270MB of GPU memory.
我有许多经过训练的模型(*.pt 文件),我将它们加载并移动到 GPU,总共占用了 270MB 的 GPU 内存。 I am using a batch size of 1. For every sample, I load a single image and also move it to the GPU.
我使用的批量大小为 1。对于每个样本,我加载单个图像并将其移动到 GPU。 Then, depending on the sample, I need to run a sequence of these trained models.
然后,根据样本,我需要运行一系列这些经过训练的模型。 Some models have a tensor as input and as output.
一些模型有一个张量作为输入和输出。 Other models have a tensor as input, but a string as output.
其他模型有一个张量作为输入,但一个字符串作为输出。 The final model in a sequence always has a string as output.
序列中的最终模型总是有一个字符串作为输出。 The intermediary tensors are temporarily stored in a dictionary.
中间张量临时存储在字典中。 When a model has consumed a tensor input, it is deleted using
del
.当模型消耗了张量输入时,使用
del
其删除。 Still, I notice that after every sample, the GPU memory keeps increasing until the entire memory is full.不过,我注意到每次采样后,GPU 内存都会不断增加,直到整个内存已满。
Below is some pseudocode to give you a better idea of what is going on:下面是一些伪代码,可以让您更好地了解正在发生的事情:
with torch.no_grad():
trained_models = load_models_from_pt() # Loaded and moved to GPU, taking 270MB
model = Model(trained_models) # Keeps the trained_models in a dictionary by name
for sample in data_loader:
# A sample contains a single image and is moved to the GPU
# A sample also has some other information, but no other tensors
model.forward(sample)
class Model(nn.Module)
def __init__(self, trained_models):
self.trained_models = trained_models
self.intermediary = {}
def forward(sample):
for i, elem in enumerate(sample['sequence']):
name = elem['name']
in = elem['input']
if name == 'a':
model = self.trained_models['a']
out = model(self.intermediary[in])
del self.intermediary[in]
self.intermediary[i] = out
elif name == 'b':
model self.trained_models['b']
out = model(self.intermediary[in])
del self.intermediary[in]
self.intermediary[i] = out
elif ...
I have no idea why the GPU is out of memory.我不知道为什么 GPU 内存不足。 Any ideas?
有任何想法吗?
尝试在 del 之后添加 torch.cuda.empty_cache()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.