PyTorch 中的權重標准化

Question

本文介紹了一種重要的權重歸一化技術，並已包含在 PyTorch 中，如下所示：

    from torch.nn.utils import weight_norm
    weight_norm(nn.Conv2d(in_channles, out_channels))

從我了解到的文檔中， weight_norm在每次forward()傳遞之前都會重新參數化。 但是我不確定當一切都在內部with torch.no_grad()運行並且 model 設置為eval()模式時，這種重新參數化是否也在推理過程中發生。

有人可以幫我知道weight_norm是否僅在訓練期間或如上所述的推理模式期間有效？

謝謝

Answer 1

我測試了“no_gard”，它有效！

對於“remove_weight_norm”，我仍然感到困惑。 我在 model 中經常使用 WeightNorm(conv1d)。 要導出 model，我使用以下代碼，有或沒有“remove_weight_norm”功能，它調用 function“nn.utils.remove_weight_norm”到所有相關的。

model.load_state_dict(checkpoint)
model = model.eval()
model.remove_weight_norm(); //with and without this code
remove_hooks(model)
scripted_module = torch.jit.script(model)
torch.jit.save(scripted_module, 'model.pt')

然后我使用 C++ 代碼和 libtorch 測試了兩個模型。 但結果並不相同。

我想知道 weight_norm 在推理中做了什么？ 有用嗎？

Answer 2

我終於弄清楚了問題所在。

批量歸一化在訓練期間學習兩個參數並將它們用於推理。 因此，有必要使用eval()來更改其行為，以告知不要進一步修改它們。

然后，我仔細檢查了權重歸一化論文，發現它“本質上是確定性的”。 它只是將原始權重向量解耦為兩個量的乘積，如下所示。

w = g . v

顯然，您使用 LHS 來計算 output 或 RHS 都沒有關系。 然而，通過將其解耦為兩個向量並將它們傳遞給優化器並刪除w參數，可以實現更好的訓練。 出於原因，請參閱對事物進行了很好描述的論文。

因此，在測試期間是否刪除了權重歸一化並不重要。 為了驗證這一點，我嘗試了以下小代碼。

import torch
import torch.nn as nn
from torch.nn.utils import weight_norm as wn
from torch.nn.utils import remove_weight_norm as wnr

# define the model 'm'
m = wn(nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=1, bias=True))

ip = torch.rand(1,1,5,5)
target = torch.rand(1,1,5,5)
l1 = torch.nn.L1Loss()
optimizer = torch.optim.Adam(m.parameters())



# begin training
for _ in range(5):
    out = m(ip)
    loss = l1(out,target)
    loss.backward()
    optimizer.step()

with torch.no_grad():
    m.eval()
    print('\no/p after training with wn: {}'.format(m(ip)))
    wnr(m)
    print('\no/p after training without wn: {}'.format(m(ip)))

# begin testing
m2 = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3,padding=1, bias=True)
m2.load_state_dict(m.state_dict())

with torch.no_grad():
    m2.eval()
    out = m2(ip)
    print('\nOutput during testing and without weight_norm: {}'.format(out))

下面是output，

o/p after training with wn: 
tensor([[[[0.0509, 0.3286, 0.4612, 0.1795, 0.0307],
          [0.1846, 0.3931, 0.5713, 0.2909, 0.4026],
          [0.1716, 0.5971, 0.4297, 0.0845, 0.6172],
          [0.2938, 0.2389, 0.4478, 0.5828, 0.6276],
          [0.1423, 0.2065, 0.5024, 0.3979, 0.3127]]]])

o/p after training without wn: 
tensor([[[[0.0509, 0.3286, 0.4612, 0.1795, 0.0307],
          [0.1846, 0.3931, 0.5713, 0.2909, 0.4026],
          [0.1716, 0.5971, 0.4297, 0.0845, 0.6172],
          [0.2938, 0.2389, 0.4478, 0.5828, 0.6276],
          [0.1423, 0.2065, 0.5024, 0.3979, 0.3127]]]])

Output during testing and without weight_norm: 
tensor([[[[0.0509, 0.3286, 0.4612, 0.1795, 0.0307],
          [0.1846, 0.3931, 0.5713, 0.2909, 0.4026],
          [0.1716, 0.5971, 0.4297, 0.0845, 0.6172],
          [0.2938, 0.2389, 0.4478, 0.5828, 0.6276],
          [0.1423, 0.2065, 0.5024, 0.3979, 0.3127]]]])

請注意，所有值都與僅發生重新參數化完全相同。

關於，

然后我使用 C++ 代碼和 libtorch 測試了兩個模型。 但結果並不相同。

請參閱報告 TorchScript 錯誤的https://github.com/pytorch/pytorch/issues/21275 。

而關於，

我想知道 weight_norm 在推理中做了什么？ 有用嗎？

答案是它什么都不做。 你做x * 2或x * (1+1)沒關系。 它沒有用，但也無害。 所以最好去掉。

Answer 3

它應該是活躍的。 .eval() 影響您的網絡層（例如 Dropout 和 BatchNorm 層）。 評估文檔

.no_grad() 減少 memory 並在推理過程中加快計算速度。 no_grad 文檔我認為 weight_norm 不受任何影響。

問候

PyTorch 中的權重標准化

問題描述

3 個解決方案

解決方案1
2 2020-06-05 02:04:07

解決方案2
2 已采納 2020-06-05 11:23:00

解決方案3
0 2020-06-04 08:34:19

PyTorch 中的權重標准化

問題描述

3 個解決方案

解決方案1 2 2020-06-05 02:04:07

解決方案2 2 已采納 2020-06-05 11:23:00

解決方案3 0 2020-06-04 08:34:19

解決方案1
2 2020-06-05 02:04:07

解決方案2
2 已采納 2020-06-05 11:23:00

解決方案3
0 2020-06-04 08:34:19