在回归中，MLP不会随着MLP而移动

Question

Given input features as such, just raw numbers: 鉴于输入功能，只需原始数字：

tensor([0.2153, 0.2190, 0.0685, 0.2127, 0.2145, 0.1260, 0.1480, 0.1483, 0.1489,
        0.1400, 0.1906, 0.1876, 0.1900, 0.1925, 0.0149, 0.1857, 0.1871, 0.2715,
        0.1887, 0.1804, 0.1656, 0.1665, 0.1137, 0.1668, 0.1168, 0.0278, 0.1170,
        0.1189, 0.1163, 0.2337, 0.2319, 0.2315, 0.2325, 0.0519, 0.0594, 0.0603,
        0.0586, 0.0067, 0.0624, 0.2691, 0.0617, 0.2790, 0.2805, 0.2848, 0.2454,
        0.1268, 0.2483, 0.2454, 0.2475], device='cuda:0')

And the expected output is a single real number output, eg 并且预期输出是单个实数输出，例如

tensor(-34.8500, device='cuda:0')

Full code on https://www.kaggle.com/alvations/pytorch-mlp-regression https://www.kaggle.com/alvations/pytorch-mlp-regression上的完整代码

I've tried creating a simple 2 layer network with: 我尝试用以下方法创建一个简单的2层网络：

class MLP(nn.Module):
    def __init__(self, input_size, output_size, hidden_size):
        super(MLP, self).__init__()
        self.linear = nn.Linear(input_size, hidden_size)
        self.classifier = nn.Linear(hidden_size, output_size)

    def forward(self, inputs, hidden=None, dropout=0.5):
        inputs = F.dropout(inputs, dropout) # Drop-in.
        # First Layer.
        output = F.relu(self.linear(inputs))

        # Matrix manipulation magic.
        batch_size, sequence_len, hidden_size = output.shape
        # Technically, linear layer takes a 2-D matrix as input, so more manipulation...
        output = output.contiguous().view(batch_size * sequence_len, hidden_size)
        # Apply dropout.
        output = F.dropout(output, dropout)

        # Put it through the classifier
        # And reshape it to [batch_size x sequence_len x vocab_size]
        output = self.classifier(output).view(batch_size, sequence_len, -1)

        return output

And training as such: 并且培训如下：

# Training routine.
def train(num_epochs, dataloader, valid_dataset, model, criterion, optimizer):
    losses = []
    valid_losses = []
    learning_rates = []
    plt.ion()
    x_valid, y_valid = valid_dataset
    for _e in range(num_epochs):
        for batch in tqdm(dataloader):
            # Zero gradient.
            optimizer.zero_grad()
            #print(batch)
            this_x = torch.tensor(batch['x'].view(len(batch['x']), 1, -1)).to(device)
            this_y = torch.tensor(batch['y'].view(len(batch['y']), 1, 1)).to(device)

            # Feed forward. 
            output = model(this_x)

            prediction, _ = torch.max(output, dim=1)
            loss = criterion(prediction, this_y.view(len(batch['y']), -1))
            loss.backward()
            optimizer.step()
            losses.append(torch.sqrt(loss.float()).data)

            with torch.no_grad():
                # Zero gradient.
                optimizer.zero_grad()
                output = model(x_valid.view(len(x_valid), 1, -1))
                prediction, _ = torch.max(output, dim=1)
                loss = criterion(prediction, y_valid.view(len(y_valid), -1))
                valid_losses.append(torch.sqrt(loss.float()).data)

            clear_output(wait=True)
            plt.plot(losses, label='Train')
            plt.plot(valid_losses, label='Valid')
            plt.legend()
            plt.pause(0.05)

Tuning several hyperparameters, it looks like the model doesn't train well, the validation loss doesn't move at all eg 调整几个超参数，看起来模型不能很好地训练，验证损失根本不会移动，例如

hyperparams = Hyperparams(input_size=train_dataset.x.shape[1], 
                          output_size=1, 
                          hidden_size=150, 
                          loss_func=nn.MSELoss,
                          learning_rate=1e-8, 
                          optimizer=optim.Adam, 
                          batch_size=500)

And it's loss curve: 而它的损失曲线：

Any idea what's wrong with the network? 知道网络有什么问题吗？

Am I training the regression model with the wrong loss? 我是否用错误的损失训练回归模型？ Or I've just not yet found the right hyperparameters? 或者我还没有找到合适的超参数？

Or am I validating the model wrongly? 或者我是否错误地验证了模型？

Answer 1

From the code you provided, it is tough to say why the validation loss is constant but I see several problems in your code. 从您提供的代码中，很难说为什么验证丢失是不变的，但我在您的代码中看到了几个问题。

Why do you validate for each training mini-batch? 为什么要验证每个培训小批量？ Instead, you should validate your model after you do the training for one complete epoch (iterating over your full dataset once). 相反，您应该在完成一个完整时期的训练后验证模型（迭代完整数据集一次）。 So, the skeleton should be like: 所以，骨架应该像：

for _e in range(num_epochs):
    for batch in tqdm(train_dataloader):
        # training code

    with torch.no_grad():
        for batch in tqdm(valid_dataloader):
            # validation code

    # plot your loss values

Also, you can plot after each epoch, not after each mini-batch training. 此外，您可以在每个时期之后进行绘图，而不是在每次小批量培训之后进行绘图。

Did you check whether the model parameters are getting updated after optimizer.step() during training? 您是否在培训期间检查了optimizer.step()之后是否更新了模型参数？ How many validation examples do you have? 您有多少验证示例？ Why don't you use mini-batch computation during validation? 为什么不在验证期间使用小批量计算？
Why do you do: optimizer.zero_grad() during validation? 为什么这样做：验证期间的optimizer.zero_grad() ？ It doesn't make sense because, during validation, you are not going to do anything related to optimization. 它没有意义，因为在验证过程中，您不会做任何与优化相关的事情。
You should use model.eval() during validation to turn off the dropouts. 您应该在验证期间使用model.eval()来关闭丢失。 See PyTorch documentation to learn about .train() and .eval() methods. 请参阅PyTorch文档以了解.train()和.eval()方法。
The learning rate is set to 1e-8, isn't it too small? 学习率设定为1e-8，是不是太小了？ Why don't you use the default learning rate for Adam (1e-3)? 为什么不使用Adam（1e-3）的默认学习率？

The following requires some reasoning. 以下需要一些推理。

Why are you using such a large batch size? 你为什么要使用这么大的批量？ What is your training dataset size? 您的训练数据集大小是多少？
You can directly plot the MSELoss, instead of taking the square root. 您可以直接绘制MSELoss，而不是取平方根。

My suggestion would be: use some existing resources for MLP in PyTorch. 我的建议是：在PyTorch中使用一些现有的MLP资源。 Don't do it from scratch if you do not have sufficient knowledge at this point. 如果此时您没有足够的知识，请不要从头开始。 It would make you suffer a lot. 这会让你受苦很多。

在回归中，MLP不会随着MLP而移动

问题描述

1 个解决方案

解决方案1
2 2019-05-10 06:25:06

在回归中，MLP不会随着MLP而移动

问题描述

1 个解决方案

解决方案1 2 2019-05-10 06:25:06

解决方案1
2 2019-05-10 06:25:06