简体   繁体   English

如何摆脱PyTorch.autograd中的Variable API?

[英]How to get rid of Variable API in PyTorch.autograd?

I am forwarding, and backpropping tensor data X through two simple nn.Module PyTorch models instances, model1 and model2 . 我转发,并backpropping张量数据X通过两个简单的nn.Module PyTorch模型的实例, model1model2

I can't get this process to work without usage of the depreciated Variable API . 如果不使用已贬值的Variable API,就无法使此过程正常运行。

So this works just fine: 所以这很好用:

    y1 = model1(X)
    v = Variable(y1.data, requires_grad=training)         # Its all about this line!
    y2 = model2(v)
    criterion = nn.NLLLoss()
    loss = criterion(y2, y)
    loss.backward()
    y1.backward(v.grad)
    self.step()

But this will throw an error: 但这会引发错误:

    y1 = model1(X)
    y2 = model2(y1)
    criterion = nn.NLLLoss()
    loss = criterion(y2, y)
    loss.backward()
    y1.backward(y1.grad) # it breaks here
    self.step()
>>> RuntimeError: grad can be implicitly created only for scalar outputs

I just can't seem to find a relevant difference between v in the first implementation, and y1 in the second. 我只是似乎找不到第一个实现中的v与第二个实现中的y1之间的相关区别。 In both cases requires_grad is set to True . 在这两种情况下, requires_grad都设置为True The only thing I could find was that y1.grad_fn=<ThnnConv2DBackward> and v.grad_fn=<ThnnConv2DBackward> 我唯一能找到的是y1.grad_fn=<ThnnConv2DBackward>v.grad_fn=<ThnnConv2DBackward>

What am I missing here? 我在这里想念什么? What (tensor attributes?) do I not know about, and if Variable is depreciated, what other implementation would work? 我不知道什么(张量属性?),并且如果对Variable进行了折旧,还有哪些其他实现可行?

[UPDATED] You are not correctly passing the y1.grad into y1.backward in the second example. [增订]您还没有正确地传递y1.grady1.backward在第二个例子。 After the first backward all the intermediate gradient will be destroyed, you need a special hook to extract that gradients. 在第一个backward的中间梯度将被销毁之后,您需要一个特殊的钩子来提取该梯度。 And in your case you are passing the None value. 在您的情况下,您正在传递None值。 Here is small example to reproduce your case: 这是一个重现您的案例的小例子:

Code: 码:

import torch
import torch.nn as nn


torch.manual_seed(42)


class Model1(nn.Module):

    def __init__(self):
        super().__init__()

    def forward(self, x):
        return x.pow(3)


class Model2(nn.Module):

    def __init__(self):
        super().__init__()

    def forward(self, x):
        return x / 2


model1 = Model1()
model2 = Model2()
criterion = nn.MSELoss()

X = torch.randn(1, 5, requires_grad=True)
y = torch.randn(1, 5)

y1 = model1(X)
y2 = model2(y1)

loss = criterion(y2, y)
# We are going to backprop 2 times, so we need to 
# retain_graph=True while first backward
loss.backward(retain_graph=True)

try:
    y1.backward(y1.grad)
except RuntimeError as err:
    print(err)
    print('y1.grad: ', y1.grad)

Output: 输出:

grad can be implicitly created only for scalar outputs
y1.grad:  None

So you need to extract them correctly: 因此,您需要正确提取它们:

Code: 码:

def extract(V):
    """Gradient extractor.
    """
    def hook(grad):
        V.grad = grad
    return hook


model1 = Model1()
model2 = Model2()
criterion = nn.MSELoss()

X = torch.randn(1, 5, requires_grad=True)
y = torch.randn(1, 5)

y1 = model1(X)
y2 = model2(y1)

loss = criterion(y2, y)
y1.register_hook(extract(y1))
loss.backward(retain_graph=True)

print('y1.grad', y1.grad)

y1.backward(y1.grad)

Output: 输出:

y1.grad:  tensor([[-0.1763, -0.2114, -0.0266, -0.3293,  0.0534]])

After some investigation I came to the following two solutions. 经过一番调查,我得出以下两种解决方案。 The solution provided elsewhere in this thread retained the computation graph manually, without an option the free them, thus running fine initially, but causing OOM errors later on. 该线程其他地方提供的解决方案手动保留了计算图,没有释放它们的选项,因此一开始运行良好,但随后导致OOM错误。

The first solution is to tie the models together using the built in torch.nn.Sequential as such: 第一个解决方案是使用内置的torch.nn.Sequential将模型绑定在一起,如下所示:

model = torch.nn.Sequential(Model1(), Model2())

it's as easy as that. 就这么简单。 It looks clean and behaves exactly like an ordinary model would. 它看起来很干净,并且行为与普通模型完全相同。

The alternative is to simply tie them together manually: 另一种方法是简单地手动将它们绑在一起:

        model1 = Model1()
        model2 = Model2()
        y1 = model1(X)
        y2 = model2(y1)
        loss = criterion(y2, y)
        loss.backward()

My fear that this would only backpropagate model2 turned out to be unsubstantiated, since model1 is also stored in the computation graph that is back propagated over. 我担心这只会反向传播model2 ,这是没有根据的,因为model1也存储在反向传播的计算图中。 This implementation enabled inceased transparancy of the interface between the two models, compared to the previous implementation. 与以前的实现相比,此实现提高了两个模型之间的接口的透明度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM