简体   繁体   English

Pytorch:“模型权重不变”

[英]Pytorch: “Model Weights not Changing”

Can someone help me understand why the weights are not updating? 有人可以帮我理解为什么权重没有更新吗?

    unet = Unet()
    optimizer = torch.optim.Adam(unet.parameters(), lr=0.001)
    loss_fn = torch.nn.MSELoss()
    input =  Variable(torch.randn(32, 1, 64, 64, 64 ), requires_grad=True)
    target = Variable(torch.randn(32, 1, 64, 64, 64), requires_grad=False)

    optimizer.zero_grad()
    y_pred = unet(input)
    y = target[: , : , 20:44, 20:44, 20:44]

    loss = loss_fn(y_pred, y)
    print(unet.conv1.weight.data[0][0]) # weights of the first layer in the unet
    loss.backward()
    optimizer.step()
    print(unet.conv1.weight.data[0][0]) # weights havent changed

The model is defined like: 该模型的定义如下:

class Unet(nn.Module):

def __init__(self):
  super(Unet, self).__init__()

  # Down hill1
  self.conv1 = nn.Conv3d(1, 2, kernel_size=3,  stride=1)
  self.conv2 = nn.Conv3d(2, 2, kernel_size=3,  stride=1)

  # Down hill2
  self.conv3 = nn.Conv3d(2, 4, kernel_size=3,  stride=1)
  self.conv4 = nn.Conv3d(4, 4, kernel_size=3,  stride=1)

  #bottom
  self.convbottom1 = nn.Conv3d(4, 8, kernel_size=3,  stride=1)
  self.convbottom2 = nn.Conv3d(8, 8, kernel_size=3,  stride=1)

  #up hill1
  self.upConv0 = nn.Conv3d(8, 4, kernel_size=3,  stride=1)
  self.upConv1 = nn.Conv3d(4, 4, kernel_size=3,  stride=1)
  self.upConv2 = nn.Conv3d(4, 2, kernel_size=3,  stride=1)

  #up hill2
  self.upConv3 = nn.Conv3d(2, 2, kernel_size=3, stride=1)
  self.upConv4 = nn.Conv3d(2, 1, kernel_size=1, stride=1)

  self.mp = nn.MaxPool3d(kernel_size=3, stride=2, padding=1)
  # some more irrelevant properties...

The forward function looks like: 转发功能如下所示:

def forward(self, input):
    # Use U-net Theory to Update the filters.
    # Example Approach...
    input = F.relu(self.conv1(input))
    input = F.relu(self.conv2(input))

    input = self.mp(input)

    input = F.relu(self.conv3(input))
    input = F.relu(self.conv4(input))

    input = self.mp(input)

    input = F.relu(self.convbottom1(input))
    input = F.relu(self.convbottom2(input))

    input = F.interpolate(input, scale_factor=2, mode='trilinear')

    input = F.relu(self.upConv0(input))
    input = F.relu(self.upConv1(input))

    input = F.interpolate(input, scale_factor=2, mode='trilinear')


    input = F.relu(self.upConv2(input))
    input = F.relu(self.upConv3(input))

    input = F.relu(self.upConv4(input))

    return input

I have followed the approach of any example and documentation i could find and it is beyound me why that doesn't work? 我遵循了我可以找到的任何示例和文档的方法,这使我无法理解为什么不起作用?

I can figure out as much that y_pred.grad after the backward call is none which it shouldn't be. 我可以弄清楚在向后调用之后y_pred.grad不应该是不应该的。 If we have no gradient then ofcourse the optimizer can't change the weights in any direction but why is there no gradient? 如果我们没有梯度,那么优化器当然不能在任何方向上改变权重,但是为什么没有梯度呢?

I identified this problem to be of "The Dying ReLu Problem" Due to the data being Hounsfield units and Pytorch uniform distribution of initial weights meant that many neurons would start out in ReLu's zero region leaving them paralyzed and dependable on other neurons to produce a gradient that could pull them out of the zero region. 我将这个问题归结为“垂死的ReLu问题”,因为数据是Hounsfield单位,并且Pytorch初始权重的均匀分布意味着许多神经元将从ReLu的零区域开始,从而使它们瘫痪并依赖于其他神经元产生梯度。可能会将它们拉出零区域。 This unlikely to happen as training progresses all neurons gets pushed into ReLu's zero region. 随着训练的进行,所有神经元都被推入ReLu的零区,这种情况不太可能发生。

There are several solutions to this problem. 有几种解决方案。 You can use Leaky_relu or other activation functions that do not have a zero region. 您可以使用Leaky_relu或其他没有零区域的激活函数。

You can also normalize the input data using Batch Normalization and initialize the weights to only be of the positive kind. 您还可以使用“批归一化”对输入数据进行归一化,并将权重初始化为仅是正数。

Solution number two is probably the most optimal solution since both will solve the problem but leaky_relu will prolong training whereas Batch normalization will do the opposite and increase accuracy. 第二种解决方案可能是最理想的解决方案,因为这两种解决方案都可以解决问题,但是leaky_relu将延长训练时间,而批处理规范化则相反,并提高了准确性。 On the other hand, Leaky_relu is an easy fix whereas the other solution requires a little extra work. 另一方面,Leaky_relu是一个简单的解决方案,而其他解决方案则需要一些额外的工作。

For Hounsfield data, one could also add a constant of 1000 to the input eliminating negative units from the data. 对于Hounsfield数据,还可以向输入中添加常量1000,以消除数据中的负单位。 This still requires a different weight initialization than Pytorch's standard initialization. 这仍然需要与Pytorch的标准初始化不同的权重初始化。

I do not think that weights should be printed with command you use. 我不认为应该使用您使用的命令打印权重。 Try print(unet.conv1.state_dict()["weight"]) instead of print(unet.conv1.weight.data[0][0]) . 尝试使用print(unet.conv1.state_dict()["weight"])代替print(unet.conv1.weight.data[0][0])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM