简体   繁体   中英

Why am I getting different results after saving and loading model weights in pytorch?

I have written a model, the architecture is follows:

CNNLSTM(                                                                                                                                                                                  
  (cnn): CNNText(                                                                                                                                                                         
    (embed): Embedding(19410, 300, padding_idx=0)                                                                                                                                         
    (convs1): ModuleList(                                                                                                                                                                 
      (0): Conv2d(1, 32, kernel_size=(3, 300), stride=(1, 1))                                                                                                                             
      (1): Conv2d(1, 32, kernel_size=(5, 300), stride=(1, 1))                                                                                                                             
      (2): Conv2d(1, 32, kernel_size=(7, 300), stride=(1, 1))                                                                                                                             
    )                                                                                                                                                                                     
    (dropout): Dropout(p=0.6)                                                                                                                                                             
    (fc1): Linear(in_features=96, out_features=1, bias=True)                                                                                                                              
  )                                                                                                                                                                                       
  (lstm): RNN(                                                                                                                                                                        
    (embedding): Embedding(19410, 300, padding_idx=0)                                                                                                                                     
    (rnn): LSTM(300, 150, batch_first=True, bidirectional=True)                                                                                                                           
    (attention): Attention(                                                                                                                                                               
      (dense): Linear(in_features=300, out_features=1, bias=True)                                                                                                                         
      (tanh): Tanh()                                                                                                                                                                      
      (softmax): Softmax()                                                                                                                                                                
    )                                                                                                                                                                                     
    (fc1): Linear(in_features=300, out_features=50, bias=True)                                                                                                                            
    (dropout): Dropout(p=0.5)                                                                                                                                                             
    (fc2): Linear(in_features=50, out_features=1, bias=True)                                                                                                                              
  )                                                                                                                                                                                       
  (fc1): Linear(in_features=146, out_features=1, bias=True)                                                                                                                               
)

I have used the RNN and the CNN differently on the same dataset and I have the weights saved. In the mixed model, I load the weights using the following function:

def load_pretrained_weights(self, model='cnn', path=None):
    if model not in ['cnn', 'rnn']:
        raise AttributeError("Model must be either rnn or cnn")
    if model == 'cnn':
        self.cnn.load_state_dict(torch.load(path))
    if model == 'rnn':
        self.lstm.load_state_dict(torch.load(path))

And freeze the sub modules using the function:

def freeze(self):    
    for p in self.cnn.parameters():
        p.requires_grad = False
    for p in self.lstm.parameters():
        p.requires_grad = False

Then I train the model, and got better result compared to the each submodule trained and evaluated alone. I used an early-stopping technique in my epoch loop to save the best parameters. After training I made a new instance of the same class and when I load the saved “best” parameters I am not getting similar result. I tried the same thing with each submodule (RNN and CNNText here) alone, it worked. But in this case it is not giving the same performance.

Please help me understand it what is happening here. I am new to Deep Learning concepts. Thank you.

Few Experiments I tried:

  1. I loaded the saved weights of each submodule and loaded the best parameters, got somehow close to the best result.
  2. Took the hidden layer from each submodule before applying the dropout, got better than the previous, but not the best!

EDIT

The init function of my class is as follows. And the RNN and CNN are just usual implementations.

class CNNLSTM(nn.Module):

    def __init__(self, vocab_size, embedding_dim, embedding_weight, rnn_arch, isCuda=True, class_num=1, kernel_num=32, kernel_sizes=[3,4,5],train_wv=False, rnn_num_layers=1, rnn_bidirectional=True, rnn_use_attention=True):

        super(CNNLSTM, self).__init__()
        self.cnn = CNNText(vocab_size, embedding_dim, embedding_weight, class_num, kernel_num = kernel_num, kernel_sizes=kernel_sizes, static=train_wv,dropout=0.6)
        self.lstm = RNN(rnn_arch, vocab_size, embedding_dim, embedding_weight, num_layers=rnn_num_layers, rnn_unit='lstm', embedding_train=train_wv, isCuda=isCuda, bidirectional=rnn_bidirectional, use_padding=True, use_attention=rnn_use_attention, num_class=class_num)
        self.fc1 = nn.Linear(rnn_arch[-1] + len(kernel_sizes) * kernel_num , class_num)

After declaring the object

I Loaded individual pre-trained submodule as,

model.load_pretrained_weights('rnn', 'models/bilstm_2_atten.pth')
model.load_pretrained_weights('cnn', 'models/cnn2.pth')

model.freeze()

Then I train the last linear layer. I saved the model parameter values as

torch.save(model.state_dict(),path)

So at 3rd/4th from last epoch I am getting the 'best' result. And after training I loaded the parameters for best result with,

state_dict = torch.load(MODEL_PATH)
model.load_state_dict(state_dict)

After loading the model, you need to write model.eval() .

state_dict = torch.load(MODEL_PATH)
model.load_state_dict(state_dict)
model.eval()

Reference: Pytorch Documentation

This is what it says:


When saving a model for inference, it is only necessary to save the trained model's learned parameters. Saving the model's state_dict with the torch.save() function will give you the most flexibility for restoring the model later, which is why it is the recommended method for saving models.

A common PyTorch convention is to save models using either a .pt or .pth file extension.

Remember that you must call model.eval() to set dropout and batch normalization layers to evaluation mode before running inference. Failing to do this will yield inconsistent inference results.


The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM