简体   繁体   English

从 Pytorch 中的 GPU/CPU 中删除 model

[英]Delete model from GPU/CPU in Pytorch

I have a big issue with memory.我对 memory 有一个大问题。 I am developing a big application with GUI for testing and optimizing neural networks.我正在开发一个带有 GUI 的大型应用程序,用于测试和优化神经网络。 The main program is showing the GUI, but training is done in thread.主程序显示 GUI,但训练是在线程中完成的。 In my app I need to train many models with different parameters one after one.在我的应用程序中,我需要一个接一个地训练许多具有不同参数的模型。 To do this I need to create a model for each attempt.为此,我需要为每次尝试创建一个 model。 When I train one I want to delete it and train new one, but I cannot delete old model.当我训练一个时,我想删除它并训练新的,但我不能删除旧的 model。 I am trying to do something like this:我正在尝试做这样的事情:

del model
torch.cuda.empty_cache()

but GPU memory doesn't change,但是 GPU memory 没有改变,

then i tried to do this:然后我试着这样做:

model.cpu()
del model

When I move model to CPU, GPU memory is freed but CPU memory increase.当我将 model 移动到 CPU 时,GPU memory 被释放,但 CPU ZCD69B4957F06CD818D7B3D61 增加。 In each attempt of training, memory is increasing all the time.在每次尝试训练中,memory 一直在增加。 Only when I close my app and run it again the all memory is freed.只有当我关闭我的应用程序并再次运行它时,所有 memory 才会被释放。

Is there a way to delete model permanently from GPU or CPU?有没有办法从 GPU 或 CPU 中永久删除 model?

Edit: Code:编辑:代码:

Thread, where the procces of training take pleace:线程,训练过程取悦:

class uczeniegridsearcch(QObject):
     endofoneloop = pyqtSignal()
     endofonesample = pyqtSignal()
     finished = pyqtSignal()
     def __init__(self, train_loader, test_loader, epoch, optimizer, lenoftd, lossfun, numberofsamples, optimparams, listoflabels, model_name, num_of_class, pret):
          super(uczeniegridsearcch, self).__init__()
          self.train_loaderup = train_loader
          self.test_loaderup = test_loader
          self.epochup = epoch
          self.optimizername = optimizer
          self.lenofdt = lenoftd
          self.lossfun = lossfun
          self.numberofsamples = numberofsamples
          self.acc = 0
          self.train_loss = 0
          self.sendloss = 0
          self.optimparams = optimparams
          self.listoflabels = listoflabels
          self.sel_Net = model_name
          self.num_of_class = num_of_class
          self.sel_Pret = pret
          self.modelforsend = []
          

     def setuptrainmodel(self):

          if self.sel_Net == "AlexNet":
               model = models.alexnet(pretrained=self.sel_Pret)
               model.classifier[6] = torch.nn.Linear(4096, self.num_of_class)
          elif self.sel_Net == "ResNet50":
               model = models.resnet50(pretrained=self.sel_Pret)
               model.fc = torch.nn.Linear(model.fc.in_features, self.num_of_class)
          elif self.sel_Net == "VGG13":
               model = models.vgg13(pretrained=self.sel_Pret)
               model.classifier[6] = torch.nn.Linear(model.classifier[6].in_features, self.num_of_class)
          elif self.sel_Net == "DenseNet201":
               model = models.densenet201(pretrained=self.sel_Pret)
               model.classifier = torch.nn.Linear(model.classifier.in_features, self.num_of_class)

          elif self.sel_Net == "MNASnet":
               model = models.mnasnet1_0(pretrained=self.sel_Pret)
               model.classifier[1] = torch.nn.Linear(model.classifier[1].in_features, self.num_of_class)

          elif self.sel_Net == "ShuffleNet v2":
               model = models.shufflenet_v2_x1_0(pretrained=self.sel_Pret)
               model.fc = torch.nn.Linear(model.fc.in_features, self.num_of_class)

          elif self.sel_Net == "SqueezeNet":
               model = models.squeezenet1_0(pretrained=self.sel_Pret)
               model.classifier[1] = torch.nn.Conv2d(512, self.num_of_class, kernel_size=(1, 1), stride=(1, 1))
               model.num_classes = self.num_of_class

          elif self.sel_Net == "GoogleNet":
               model = models.googlenet(pretrained=self.sel_Pret)
               model.fc = torch.nn.Linear(model.fc.in_features, self.num_of_class)

          return model
     def train(self):
          
          for x in range(self.numberofsamples):



               torch.cuda.empty_cache()


               modelup = self.setuptrainmodel()
               

               device = torch.device('cuda')

               optimizerup = TableWidget.setupotimfun(self, modelup, self.optimizername, self.optimparams[(x, 0)],
                                                      self.optimparams[(x, 1)], self.optimparams[(x, 2)],
                                                      self.optimparams[(x, 3)],
                                                      self.optimparams[(x, 4)], self.optimparams[(x, 5)])

               modelup = modelup.to(device)



               

               best_accuracy = 0.0
               


               train_error_count = 0
               
               for epoch in range(self.epochup):

                    for images, labels in iter(self.train_loaderup):
                         images = images.to(device)
                         labels = labels.to(device)
                         optimizerup.zero_grad()
                         outputs = modelup(images)
                         loss = TableWidget.setuplossfun(self, lossfun=self.lossfun, outputs=outputs, labels=labels)
                         self.train_loss += loss
                         loss.backward()
                         optimizerup.step()
                         train_error_count += float(torch.sum(torch.abs(labels - outputs.argmax(1))))
                    self.train_loss /= len(self.train_loaderup)

                    test_error_count = 0.0

                    for images, labels in iter(self.test_loaderup):
                         images = images.to(device)
                         labels = labels.to(device)
                         outputs = modelup(images)
                         test_error_count += float(torch.sum(torch.abs(labels - outputs.argmax(1))))

                    test_accuracy = 1.0 - float(test_error_count) / float(self.lenofdt)

                    print('%s, %d,%d: %f %f' % ("Próba nr:", x+1, epoch, test_accuracy, self.train_loss), "Parametry: ", self.optimparams[x,:])

                    self.acc = test_accuracy
                    self.sendloss = self.train_loss.item()
                    self.endofoneloop.emit()


               self.endofonesample.emit()

               modelup.cpu()
               
               del modelup,optimizerup,device,test_accuracy,test_error_count,train_error_count,loss,labels,images,outputs
               torch.cuda.empty_cache()
               

          self.finished.emit()

How I call thread in main block:我如何在主块中调用线程:

              self.qtest = uczeniegridsearcch(self.train_loader,self.test_loader, int(self.InputEpoch.text()),
                                              self.sel_Optim,len(self.test_dataset), self.sel_Loss,
                                              int(self.numberofsamples.text()), self.params, self.listoflabels,
                                              self.sel_Net,len(self.sel_ImgClasses),self.sel_Pret)

              self.qtest.endofoneloop.connect(self.inkofprogress)
              self.qtest.endofonesample.connect(self.inksamples)
              self.qtest.finished.connect(self.prints)
              testtret = threading.Thread(target=self.qtest.train)
              testtret.start()

Assuming that the model creation code is run iteratively inside a loop,I suggest the following假设 model 创建代码在循环内迭代运行,我建议如下

  1. Put code for model creation, training,evaluation and model deletion code inside a separate function and call that function from the loop body. Put code for model creation, training,evaluation and model deletion code inside a separate function and call that function from the loop body.
  2. Call gc.collect() after the function call在 function 调用之后调用gc.collect()

The rational for first point is that the model creation, deletion and cache clearing would happen in a separate stack and it would force the GPU memory clearance when the method returns.第一点的理由是 model 创建、删除和缓存清除将在单独的堆栈中发生,当方法返回时,它将强制 GPU memory 清除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM