[英]Gradient Computation broken by Sigmoid function in Pytorch
Hey I have been struggling with this weird problem.嘿,我一直在努力解决这个奇怪的问题。 Here is my code for the Neural Net:
这是我的神经网络代码:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv_3d_=nn.Sequential(
nn.Conv3d(1,1,9,1,4),
nn.LeakyReLU(),
nn.Conv3d(1,1,9,1,4),
nn.LeakyReLU(),
nn.Conv3d(1,1,9,1,4),
nn.LeakyReLU()
)
self.linear_layers_ = nn.Sequential(
nn.Linear(batch_size*32*32*32,batch_size*32*32*3),
nn.LeakyReLU(),
nn.Linear(batch_size*32*32*3,batch_size*32*32*3),
nn.Sigmoid()
)
def forward(self,x,y,z):
conv_layer = x + y + z
conv_layer = self.conv_3d_(conv_layer)
conv_layer = torch.flatten(conv_layer)
conv_layer = self.linear_layers_(conv_layer)
conv_layer = conv_layer.view((batch_size,3,input_sizes,input_sizes))
return conv_layer
The weird problem I am facing is that running this NN gives me an error我面临的奇怪问题是运行这个神经网络给我一个错误
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3072]], which is output 0 of SigmoidBackward, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
The stack trace shows that the issue is in line堆栈跟踪显示问题符合
conv_layer = self.linear_layers_(conv_layer)
However, if I replace the last activation function of my FCN from nn.Sigmoid() to nn.LeakyRelu(), the NN executes properly.但是,如果我将 FCN 的最后一个激活函数从 nn.Sigmoid() 替换为 nn.LeakyRelu(),则 NN 会正确执行。
Can anyone tell me why Sigmoid activation function is causing my backward computation to break?谁能告诉我为什么 Sigmoid 激活函数会导致我的反向计算中断?
I found the problem with my code.我发现我的代码有问题。 I delved deeper into what in-place actually meant.
我深入研究了就地实际上意味着什么。 So, if you check the line
所以,如果你检查线
conv_layer = self.linear_layers_(conv_layer)
linear_layers_ of the assignment is changing the values of conv_layer in-place and as a result the values are getting overwritten and because of this, gradient computation fails.分配的linear_layers_正在改变就地conv_layer的值,因此该值被覆盖掉了,由于这个原因,梯度计算失败。 Easy solution for this problem is to use the clone() function
这个问题的简单解决方案是使用 clone() 函数
ie IE
conv_layer = self.linear_layers_(conv_layer).clone()
This creates a copy of the right hand computation and Autograd is able to store the reference of the computation graph.这会创建右手计算的副本,并且 Autograd 能够存储计算图的引用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.