简体   繁体   中英

pytorch: GRU cannot update hidden_state in-place

I got a problem when using pytorch to implement a GRU network:

My codes are as below:

import torch

class GRU_model(torch.nn.Module):
    def __init__(self, device):
        super(GRU_model, self).__init__()
        self.h = torch.randn((1,1,5), device=device, dtype=torch.float)

        self.GRU_1 = torch.nn.GRU(input_size=5, hidden_size=5)

    def forward(self, a):
        output, self.h = self.GRU_1(a, self.h)
        return output

if __name__ == '__main__':
    learn_rate=1e-4
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = GRU_model(device).to(device=device)
    optimizer = torch.optim.Adam(model.parameters(), lr=learn_rate)

    for i in range(10):
        a = torch.randn((1, 1, 5), device=device, dtype=torch.float)
        output = model(a)
        loss = (a - output).mean()

        optimizer.zero_grad()
        loss.backward(retain_graph=True)
        optimizer.step()

and I got an error like this:

Traceback (most recent call last):
  File "C:/Users/Administrator_/Desktop/Graduation_Project/MIDI_Music_style_transfer/GRU_toy_in-place_hidden_states_change/main.py", line 40, in <module>
    loss.backward(retain_graph=True)
  File "C:\Users\Administrator_\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "C:\Users\Administrator_\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\autograd\__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [15, 5]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I just want to update the hidden_state in the GRU after an epoch but it just doesn't work!

I'd be really appreciated if you could me help out!

In PyTorch, the computation graph is created for each iteration in an epoch. In each iteration, we execute the forward pass, compute the derivatives of output w.r.t to the parameters of the network, and update the parameters to fit the given examples. After doing the backward pass, the graph will be freed to save memory. In the next iteration, a fresh new graph is created and ready for back-propagation.

Because the computation graph will be freed by default after the first backward pass, you will encounter errors if you are trying to do backward on the same graph the second time. That is why the following error message pops up:

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time

source

In your case, after specifying retain_graph=True you see:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [15, 5]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The problem occurs when you try to update self.h in forward pass. You haven't modify it inplace since it's needed for grad computation. source This one should work:

import torch


class GRU_model(torch.nn.Module):
    def __init__(self, device):
        super(GRU_model, self).__init__()
        self.GRU_1 = torch.nn.GRU(input_size=5, hidden_size=5)

    def forward(self, a, h):
        output, hh = self.GRU_1(a, h)
        return output, hh


if __name__ == '__main__':
    learn_rate = 1e-4
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = GRU_model(device).to(device=device)
    model.train()
    optimizer = torch.optim.Adam(model.parameters(), lr=learn_rate)

    def mse(output, target):
        loss = torch.mean((output - target)**2)
        return loss

    for i in range(10):
        a = torch.randn((1, 1, 5), device=device, dtype=torch.float, requires_grad=True)
        target = torch.randn((1, 1, 5), device=device)
        h = torch.randn((1, 1, 5), device=device)

        optimizer.zero_grad()
        output, h = model(a, h)
        loss = mse(output, target)

        loss.backward(retain_graph=True)
        optimizer.step()

The real problem is the hidden_state shouldn't participate the computation of the gradients backpropagation. So just adding one line self.h = self.h.detach() as below that will certainly solve the problem:

    def forward(self, a):
        output, self.h = self.GRU_1(a, self.h)
        self.h = self.h.detach()
        return output

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM