Pytorch：“模型權重不變”

Question

有人可以幫我理解為什么權重沒有更新嗎？

    unet = Unet()
    optimizer = torch.optim.Adam(unet.parameters(), lr=0.001)
    loss_fn = torch.nn.MSELoss()
    input =  Variable(torch.randn(32, 1, 64, 64, 64 ), requires_grad=True)
    target = Variable(torch.randn(32, 1, 64, 64, 64), requires_grad=False)

    optimizer.zero_grad()
    y_pred = unet(input)
    y = target[: , : , 20:44, 20:44, 20:44]

    loss = loss_fn(y_pred, y)
    print(unet.conv1.weight.data[0][0]) # weights of the first layer in the unet
    loss.backward()
    optimizer.step()
    print(unet.conv1.weight.data[0][0]) # weights havent changed

該模型的定義如下：

class Unet(nn.Module):

def __init__(self):
  super(Unet, self).__init__()

  # Down hill1
  self.conv1 = nn.Conv3d(1, 2, kernel_size=3,  stride=1)
  self.conv2 = nn.Conv3d(2, 2, kernel_size=3,  stride=1)

  # Down hill2
  self.conv3 = nn.Conv3d(2, 4, kernel_size=3,  stride=1)
  self.conv4 = nn.Conv3d(4, 4, kernel_size=3,  stride=1)

  #bottom
  self.convbottom1 = nn.Conv3d(4, 8, kernel_size=3,  stride=1)
  self.convbottom2 = nn.Conv3d(8, 8, kernel_size=3,  stride=1)

  #up hill1
  self.upConv0 = nn.Conv3d(8, 4, kernel_size=3,  stride=1)
  self.upConv1 = nn.Conv3d(4, 4, kernel_size=3,  stride=1)
  self.upConv2 = nn.Conv3d(4, 2, kernel_size=3,  stride=1)

  #up hill2
  self.upConv3 = nn.Conv3d(2, 2, kernel_size=3, stride=1)
  self.upConv4 = nn.Conv3d(2, 1, kernel_size=1, stride=1)

  self.mp = nn.MaxPool3d(kernel_size=3, stride=2, padding=1)
  # some more irrelevant properties...

轉發功能如下所示：

def forward(self, input):
    # Use U-net Theory to Update the filters.
    # Example Approach...
    input = F.relu(self.conv1(input))
    input = F.relu(self.conv2(input))

    input = self.mp(input)

    input = F.relu(self.conv3(input))
    input = F.relu(self.conv4(input))

    input = self.mp(input)

    input = F.relu(self.convbottom1(input))
    input = F.relu(self.convbottom2(input))

    input = F.interpolate(input, scale_factor=2, mode='trilinear')

    input = F.relu(self.upConv0(input))
    input = F.relu(self.upConv1(input))

    input = F.interpolate(input, scale_factor=2, mode='trilinear')


    input = F.relu(self.upConv2(input))
    input = F.relu(self.upConv3(input))

    input = F.relu(self.upConv4(input))

    return input

我遵循了我可以找到的任何示例和文檔的方法，這使我無法理解為什么不起作用？

我可以弄清楚在向后調用之后y_pred.grad不應該是不應該的。 如果我們沒有梯度，那么優化器當然不能在任何方向上改變權重，但是為什么沒有梯度呢？

Answer 1

我將這個問題歸結為“垂死的ReLu問題”，因為數據是Hounsfield單位，並且Pytorch初始權重的均勻分布意味着許多神經元將從ReLu的零區域開始，從而使它們癱瘓並依賴於其他神經元產生梯度。可能會將它們拉出零區域。 隨着訓練的進行，所有神經元都被推入ReLu的零區，這種情況不太可能發生。

有幾種解決方案。 您可以使用Leaky_relu或其他沒有零區域的激活函數。

您還可以使用“批歸一化”對輸入數據進行歸一化，並將權重初始化為僅是正數。

第二種解決方案可能是最理想的解決方案，因為這兩種解決方案都可以解決問題，但是leaky_relu將延長訓練時間，而批處理規范化則相反，並提高了准確性。 另一方面，Leaky_relu是一個簡單的解決方案，而其他解決方案則需要一些額外的工作。

對於Hounsfield數據，還可以向輸入中添加常量1000，以消除數據中的負單位。 這仍然需要與Pytorch的標准初始化不同的權重初始化。

Answer 2

我不認為應該使用您使用的命令打印權重。 嘗試使用print(unet.conv1.state_dict()["weight"])代替print(unet.conv1.weight.data[0][0]) 。

Pytorch：“模型權重不變”

問題描述

2 個解決方案

解決方案1
1 已采納 2019-03-06 16:54:33

解決方案2
0 2018-11-24 21:29:18

Pytorch：“模型權重不變”

問題描述

2 個解決方案

解決方案1 1 已采納 2019-03-06 16:54:33

解決方案2 0 2018-11-24 21:29:18

解決方案1
1 已采納 2019-03-06 16:54:33

解決方案2
0 2018-11-24 21:29:18