[英]pytorch: GRU cannot update hidden_state in-place
使用 pytorch 實現 GRU 網絡時遇到問題:
我的代碼如下:
import torch
class GRU_model(torch.nn.Module):
def __init__(self, device):
super(GRU_model, self).__init__()
self.h = torch.randn((1,1,5), device=device, dtype=torch.float)
self.GRU_1 = torch.nn.GRU(input_size=5, hidden_size=5)
def forward(self, a):
output, self.h = self.GRU_1(a, self.h)
return output
if __name__ == '__main__':
learn_rate=1e-4
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = GRU_model(device).to(device=device)
optimizer = torch.optim.Adam(model.parameters(), lr=learn_rate)
for i in range(10):
a = torch.randn((1, 1, 5), device=device, dtype=torch.float)
output = model(a)
loss = (a - output).mean()
optimizer.zero_grad()
loss.backward(retain_graph=True)
optimizer.step()
我收到這樣的錯誤:
Traceback (most recent call last):
File "C:/Users/Administrator_/Desktop/Graduation_Project/MIDI_Music_style_transfer/GRU_toy_in-place_hidden_states_change/main.py", line 40, in <module>
loss.backward(retain_graph=True)
File "C:\Users\Administrator_\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "C:\Users\Administrator_\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\autograd\__init__.py", line 147, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [15, 5]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
我只想在一個紀元后更新 GRU 中的 hidden_state ,但它不起作用!
如果您能幫上忙,我將不勝感激!
在 PyTorch 中,為一個 epoch 中的每次迭代創建計算圖。 在每次迭代中,我們執行前向傳遞,計算 output w.r.t 到網絡參數的導數,並更新參數以適應給定的示例。 執行反向傳遞后,圖形將被釋放以保存 memory。 在下一次迭代中,將創建一個全新的圖並准備好進行反向傳播。
因為計算圖將在第一次反向傳遞后默認釋放,如果您嘗試第二次在同一個圖上進行反向操作,您將遇到錯誤。 這就是彈出以下錯誤消息的原因:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time
在您的情況下,在指定retain_graph=True
后,您會看到:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [15, 5]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
當您嘗試在前向傳遞中更新self.h
時會出現此問題。 您沒有inplace
修改它,因為它是 grad 計算所必需的。 來源這個應該可以工作:
import torch
class GRU_model(torch.nn.Module):
def __init__(self, device):
super(GRU_model, self).__init__()
self.GRU_1 = torch.nn.GRU(input_size=5, hidden_size=5)
def forward(self, a, h):
output, hh = self.GRU_1(a, h)
return output, hh
if __name__ == '__main__':
learn_rate = 1e-4
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = GRU_model(device).to(device=device)
model.train()
optimizer = torch.optim.Adam(model.parameters(), lr=learn_rate)
def mse(output, target):
loss = torch.mean((output - target)**2)
return loss
for i in range(10):
a = torch.randn((1, 1, 5), device=device, dtype=torch.float, requires_grad=True)
target = torch.randn((1, 1, 5), device=device)
h = torch.randn((1, 1, 5), device=device)
optimizer.zero_grad()
output, h = model(a, h)
loss = mse(output, target)
loss.backward(retain_graph=True)
optimizer.step()
真正的問題是 hidden_state 不應該參與梯度反向傳播的計算。 因此,只需添加一行self.h = self.h.detach()
如下,肯定會解決問題:
def forward(self, a):
output, self.h = self.GRU_1(a, self.h)
self.h = self.h.detach()
return output
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.