为什么我的简单前馈神经网络发散（pytorch）？

Question

I am experimenting with a simple 2 layer neural network with pytorch, feeding in only three inputs of size 10 each, with a single value as output. 我正在尝试使用带有pytorch的简单2层神经网络，仅输入3个大小为10的输入，输出单个值。 I have normalised inputs and lowered learning rate. 我有输入标准化和降低学习率。 It is my understanding that a two layer fully connected neural network should be able to trivially fit to this data 我的理解是，两层完全连接的神经网络应该能够轻松地适应这些数据

Features:

0.8138  1.2342  0.4419  0.8273  0.0728  2.4576  0.3800  0.0512  0.6872  0.5201
1.5666  1.3955  1.0436  0.1602  0.1688  0.2074  0.8810  0.9155  0.9641  1.3668
1.7091  0.9091  0.5058  0.6149  0.3669  0.1365  0.3442  0.9482  1.2550  1.6950
[torch.FloatTensor of size 3x10]


Targets
[124, 125, 122]
[torch.FloatTensor of size 3]

The code is adapted from a simple example and I am using MSELoss as the loss function. 代码改编自一个简单的例子，我使用MSELoss作为损失函数。 The loss diverges to infinity after just a few iterations: 经过几次迭代后，损失会发散到无穷大：

features = torch.from_numpy(np.array(features))

x_data = Variable(torch.Tensor(features))
y_data = Variable(torch.Tensor(targets))

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(10,5)
        self.linear2 = torch.nn.Linear(5,1)

    def forward(self, x):
        l_out1 = self.linear(x)
        y_pred = self.linear2(l_out1)
        return y_pred

model = Model()

criterion = torch.nn.MSELoss(size_average = False)
optim = torch.optim.SGD(model.parameters(), lr = 0.001)

def main():
    for iteration in range(1000):
        y_pred = model(x_data)
        loss = criterion(y_pred, y_data)

        print(iteration, loss.data[0])
        optim.zero_grad()

        loss.backward()
        optim.step()

Any help would be appreciated. 任何帮助，将不胜感激。 Thanks 谢谢

EDIT: 编辑：

Indeed it seems that this was simply due to the learning rate being too high. 事实上，这似乎只是因为learning rate过高。 Setting to 0.00001 fixes convergence issues, albeit giving very slow convergence. 设置为0.00001修复了收敛问题，尽管收敛速度很慢。

Answer 1

Maybe you can try to predict a log(y) instead of y to improve the convergence even more. 也许您可以尝试预测日志（y）而不是y来进一步提高收敛性。 Also Adam optimizer (adaptive learning rate) should help + BatchNormalization (for example between your linear layers). 此外，Adam优化器（自适应学习速率）应该有助于+ BatchNormalization（例如在线性层之间）。

Answer 2

This is because you're not using a non-linearity between layers, and your network is still Linear. 这是因为您没有在图层之间使用非线性，并且您的网络仍然是线性的。

You can use Relu in order to make it non linear. 您可以使用Relu使其非线性。 You can change the forward method like this : 您可以像这样更改正向方法：

...
y_pred = torch.nn.functional.F.relu(self.linear2(l_out1))
...

为什么我的简单前馈神经网络发散（pytorch）？

问题描述

2 个解决方案

解决方案1
0 2017-11-30 13:04:11

解决方案2
0 2018-03-15 17:34:07

为什么我的简单前馈神经网络发散（pytorch）？

问题描述

2 个解决方案

解决方案1 0 2017-11-30 13:04:11

解决方案2 0 2018-03-15 17:34:07

解决方案1
0 2017-11-30 13:04:11

解决方案2
0 2018-03-15 17:34:07