不斷分離的驗證和訓練損失

Question

我已經使用自動編碼器工作了幾個星期了，但是當涉及到我對整體損失的理解時，我似乎遇到了困難。 我面臨的問題是，當嘗試對我的 model 實施 Batchnormalization 和 Dropout 層時，我得到了沒有收斂的損失和糟糕的重建。 典型的損失 plot 是這樣的： 我使用的損失是帶有 MSE 損失的 L1 正則化，看起來像這樣

def L1_loss_fcn(model_children, true_data, reconstructed_data, reg_param=0.1, validate):
    mse = nn.MSELoss()
    mse_loss = mse(reconstructed_data, true_data)

    l1_loss = 0
    values = true_data
    if validate == False:
        for i in range(len(model_children)):
            values = F.relu((model_children[i](values)))
            l1_loss += torch.sum(torch.abs(values))

        loss = mse_loss + reg_param * l1_loss
        return loss, mse_loss, l1_loss
    else: 
        return mse_loss

我的訓練循環寫成：

    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    train_run_loss = 0
    val_run_loss = 0
    for epoch in range(epochs):
        print(f"Epoch {epoch + 1} of {epochs}")
        
        # TRAINING
        model.train()
           for data in tqdm(train_dl):
               x, _ = data
               reconstructions = model(x)
               optimizer.zero_grad()
               train_loss, mse_loss, l1_loss =L1_loss_fcn(model_children=model_children, true_data=x,reg_param=regular_param,                                             reconstructed_data=reconstructions, validate=False)
               train_loss.backward()
               optimizer.step()
               train_run_loss += train_loss.item()
         # VALIDATING 
         model.eval()
           with torch.no_grad():
               for data in tqdm(test_dl):
                   x, _ = data
                   reconstructions = model(x)
                   val_loss = L1_loss_fcn(model_children=model_children, true_data=x, reg_param=regular_param, reconstructed_data = reconstructions, validate = True)
                    val_run_loss += val_loss.item()
       
    epoch_loss_train = train_run_loss / len(train_dl)
    epoch_loss_val = val_run_loss / len(test_dl)

我在沒有運氣的情況下嘗試了不同的超參數值。 我的 model 看起來像這樣，

encoder = nn.Sequential(nn.Linear(), nn.Dropout(p=0.5), nn.LeakyReLU(), nn.BatchNorm1d(),
                        nn.Linear(), nn.Dropout(p=0.4), nn.LeakyReLU(), nn.BatchNorm1d(),
                        nn.Linear(), nn.Dropout(p=0.3), nn.LeakyReLU(), nn.BatchNorm1d(),
                        nn.Linear(), nn.Dropout(p=0.2), nn.LeakyReLU(), nn.BatchNorm1d(),
)
decoder = nn.Sequential(nn.Linear(), nn.Dropout(p=0.2), nn.LeakyReLU(),
                        nn.Linear(), nn.Dropout(p=0.3), nn.LeakyReLU(), 
                        nn.Linear(), nn.Dropout(p=0.4), nn.LeakyReLU(), 
                        nn.Linear(), nn.Dropout(p=0.5), nn.ReLU(), 
)

我期望找到的是收斂的訓練和驗證損失，因此總體上重建效果要好得多，但我認為我恐怕錯過了一些非常嚴重的東西。 一些幫助將不勝感激！

Answer 1

你不是在比較蘋果和蘋果，你的代碼是這樣的

    l1_loss = 0
    values = true_data
    if validate == False:
        for i in range(len(model_children)):
            values = F.relu((model_children[i](values)))
            l1_loss += torch.sum(torch.abs(values))

        loss = mse_loss + reg_param * l1_loss
        return loss, mse_loss, l1_loss
    else: 
        return mse_loss

所以你的validation loss只是MSE，而training是MSE + regularization，所以顯然你的train loss會更高。 如果你想比較它們，你應該只記錄不帶正則化器的訓練 MSE。

此外，不要從正則化開始，始終從完全沒有正則化的 model 開始並接受訓練以收斂。 刪除所有額外的損失，刪除你的輟學。 這些事情只會損害你的學習能力（但可能會提高泛化能力）。 一旦實現 - 一次重新引入它們。

不斷分離的驗證和訓練損失

問題描述

1 個解決方案

解決方案1
0 已采納 2023-01-26 20:15:37

不斷分離的驗證和訓練損失

問題描述

1 個解決方案

解決方案1 0 已采納 2023-01-26 20:15:37

解決方案1
0 已采納 2023-01-26 20:15:37