[英]How to get rid of Variable API in PyTorch.autograd?
I am forwarding, and backpropping tensor data X
through two simple nn.Module PyTorch models instances, model1
and model2
. 我转发,并backpropping张量数据X
通过两个简单的nn.Module PyTorch模型的实例, model1
和model2
。
I can't get this process to work without usage of the depreciated Variable
API . 如果不使用已贬值的Variable
API,就无法使此过程正常运行。
So this works just fine: 所以这很好用:
y1 = model1(X)
v = Variable(y1.data, requires_grad=training) # Its all about this line!
y2 = model2(v)
criterion = nn.NLLLoss()
loss = criterion(y2, y)
loss.backward()
y1.backward(v.grad)
self.step()
But this will throw an error: 但这会引发错误:
y1 = model1(X)
y2 = model2(y1)
criterion = nn.NLLLoss()
loss = criterion(y2, y)
loss.backward()
y1.backward(y1.grad) # it breaks here
self.step()
>>> RuntimeError: grad can be implicitly created only for scalar outputs
I just can't seem to find a relevant difference between v
in the first implementation, and y1
in the second. 我只是似乎找不到第一个实现中的v
与第二个实现中的y1
之间的相关区别。 In both cases requires_grad
is set to True
. 在这两种情况下, requires_grad
都设置为True
。 The only thing I could find was that y1.grad_fn=<ThnnConv2DBackward>
and v.grad_fn=<ThnnConv2DBackward>
我唯一能找到的是y1.grad_fn=<ThnnConv2DBackward>
和v.grad_fn=<ThnnConv2DBackward>
What am I missing here? 我在这里想念什么? What (tensor attributes?) do I not know about, and if Variable
is depreciated, what other implementation would work? 我不知道什么(张量属性?),并且如果对Variable
进行了折旧,还有哪些其他实现可行?
[UPDATED] You are not correctly passing the y1.grad
into y1.backward
in the second example. [增订]您还没有正确地传递y1.grad
到y1.backward
在第二个例子。 After the first backward
all the intermediate gradient will be destroyed, you need a special hook to extract that gradients. 在第一个backward
的中间梯度将被销毁之后,您需要一个特殊的钩子来提取该梯度。 And in your case you are passing the None
value. 在您的情况下,您正在传递None
值。 Here is small example to reproduce your case: 这是一个重现您的案例的小例子:
Code: 码:
import torch
import torch.nn as nn
torch.manual_seed(42)
class Model1(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
return x.pow(3)
class Model2(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
return x / 2
model1 = Model1()
model2 = Model2()
criterion = nn.MSELoss()
X = torch.randn(1, 5, requires_grad=True)
y = torch.randn(1, 5)
y1 = model1(X)
y2 = model2(y1)
loss = criterion(y2, y)
# We are going to backprop 2 times, so we need to
# retain_graph=True while first backward
loss.backward(retain_graph=True)
try:
y1.backward(y1.grad)
except RuntimeError as err:
print(err)
print('y1.grad: ', y1.grad)
Output: 输出:
grad can be implicitly created only for scalar outputs
y1.grad: None
So you need to extract them correctly: 因此,您需要正确提取它们:
Code: 码:
def extract(V):
"""Gradient extractor.
"""
def hook(grad):
V.grad = grad
return hook
model1 = Model1()
model2 = Model2()
criterion = nn.MSELoss()
X = torch.randn(1, 5, requires_grad=True)
y = torch.randn(1, 5)
y1 = model1(X)
y2 = model2(y1)
loss = criterion(y2, y)
y1.register_hook(extract(y1))
loss.backward(retain_graph=True)
print('y1.grad', y1.grad)
y1.backward(y1.grad)
Output: 输出:
y1.grad: tensor([[-0.1763, -0.2114, -0.0266, -0.3293, 0.0534]])
After some investigation I came to the following two solutions. 经过一番调查,我得出以下两种解决方案。 The solution provided elsewhere in this thread retained the computation graph manually, without an option the free them, thus running fine initially, but causing OOM errors later on. 该线程其他地方提供的解决方案手动保留了计算图,没有释放它们的选项,因此一开始运行良好,但随后导致OOM错误。
The first solution is to tie the models together using the built in torch.nn.Sequential
as such: 第一个解决方案是使用内置的torch.nn.Sequential
将模型绑定在一起,如下所示:
model = torch.nn.Sequential(Model1(), Model2())
it's as easy as that. 就这么简单。 It looks clean and behaves exactly like an ordinary model would. 它看起来很干净,并且行为与普通模型完全相同。
The alternative is to simply tie them together manually: 另一种方法是简单地手动将它们绑在一起:
model1 = Model1()
model2 = Model2()
y1 = model1(X)
y2 = model2(y1)
loss = criterion(y2, y)
loss.backward()
My fear that this would only backpropagate model2
turned out to be unsubstantiated, since model1
is also stored in the computation graph that is back propagated over. 我担心这只会反向传播model2
,这是没有根据的,因为model1
也存储在反向传播的计算图中。 This implementation enabled inceased transparancy of the interface between the two models, compared to the previous implementation. 与以前的实现相比,此实现提高了两个模型之间的接口的透明度。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.