[英]Can autograd in pytorch handle a repeated use of a layer within the same module?
Suppose I have a layer layer
in a torch module and use it twice or more times during a single forward
step, in a way that the result output by this layer
is later again inputed to the same layer
.假设我有一个层
layer
的焊炬模块中,并且在一个单一的使用两次或更多次forward
步骤,在某种程度上,通过该输出结果layer
稍后再次inputed到相同的layer
。 Can pytorch's autograd
compute the grad of the weights of this layer correctly? pytorch 的
autograd
能否正确计算该层权重的梯度?
Here is an mwe of what I am talking about:这是我正在谈论的内容:
import torch
import torch.nn as nn
import torch.nn.functional as F
class net(nn.Module):
def __init__(self,in_dim,out_dim):
super(net,self).__init__()
self.layer = nn.Linear(in_dim,out_dim,bias=False)
def forward(self,x):
x = self.layer(x)
x = self.layer(x)
return x
input_x = torch.tensor([10.])
label = torch.tensor([5.])
n = net(1,1)
loss_fn = nn.MSELoss()
out = n(input_x)
loss = loss_fn(out,label)
n.zero_grad()
loss.backward()
for param in n.parameters():
w = param.item()
g = param.grad
print('Input = %.4f; label = %.4f'%(input_x,label))
print('Weight = %.4f; output = %.4f'%(w,out))
print('Gradient w.r.t. the weight is %.4f'%(g))
print('And it should be %.4f'%(4*(w**2*input_x-label)*w*input_x))
And the output is (may be different on your computer if the initial value of the weight is different):并且输出是(如果权重的初始值不同,您的计算机上可能会有所不同):
Input = 10.0000; label = 5.0000
Weight = 0.9472; output = 8.9717
Gradient w.r.t. the weight is 150.4767
And it should be 150.4766
In this example, I have defined a module with only one linear layer ( in_dim=out_dim=1
and no bias).在这个例子中,我定义了一个只有一个线性层的模块(
in_dim=out_dim=1
并且没有偏差)。 w
is the weight of this layer; w
是这一层的权重; input_x
is the input value; input_x
是输入值; label
is the desired value. label
是所需的值。 Since the loss is chosen as MSE, the formula for the loss is由于损失被选为 MSE,损失的公式为
((w^2)*input_x-label)^2
Computing by hand, we have手工计算,我们有
dw/dx = 2*((w^2)*input_x-label)*(2*w*input_x)
The output of my example above shows that autograd
gives the same result as computed by hand, giving me a reason to believe that it can work in this case.我上面例子的输出表明
autograd
给出了与手工计算相同的结果,让我有理由相信它可以在这种情况下工作。 But in a real application, the layer may have inputs and outputs of higher dimensions, a nonlinear activation function after it, and the neural network could have multiple layers.但在实际应用中,该层可能有更高维度的输入和输出,其后还有一个非线性激活函数,而神经网络可能有多个层。
What I want to ask is: can I trust autograd
to handle such situation, but a lot more complicated than that in my example?我想问的是:我可以相信
autograd
来处理这种情况,但比我的例子复杂得多吗? How does it work when a layer is called iteratively?当一个层被迭代调用时它是如何工作的?
This will work just fine.这将工作得很好。 From the perspective of the autograd engine this isn't a cyclic application since the resulting computation graph will unwrap the repeated computation as a linear sequence.
从 autograd 引擎的角度来看,这不是一个循环应用程序,因为生成的计算图会将重复计算展开为线性序列。 To illustrate this, for a single layer you might have:
为了说明这一点,对于单个图层,您可能有:
x -----> layer --------+
^ |
| 2 times |
+-----------+
From the autograd perspective this looks like:从 autograd 的角度来看,这看起来像:
x ---> layer ---> layer ---> layer
Here layer
is the same layer copied 3 times over the graph.这里的
layer
是在图形上复制 3 次的同一图层。 This means when computing the gradient for the layer's weights they will be accumulated from all the three stages.这意味着在计算层权重的梯度时,它们将从所有三个阶段累积。 So when using
backward
:所以当使用
backward
:
x ---> layer ---> layer ---> layer ---> loss_func
|
lback <--- lback <--- lback <--------+
| | |
| v |
+------> weights <----+
_grad
Here lback
represents the local derivative of the layer
forward transformation which uses the upstream gradient as an input.这里
lback
表示使用上游梯度作为输入的layer
前向变换的局部导数。 Each one adds to the layer's weights_grad
.每一个都添加到层的
weights_grad
。
Recurrent Neural Networks use this repeated application of layers (cells) at their basis.循环神经网络在其基础上使用层(单元)的这种重复应用。 See for example this tutorial about Classifying Names with a Character-Level RNN .
例如,请参阅有关使用字符级 RNN对名称进行分类的教程。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.