简体   繁体   English

为什么我的简单前馈神经网络发散(pytorch)?

[英]why is my simple feedforward neural network diverging (pytorch)?

I am experimenting with a simple 2 layer neural network with pytorch, feeding in only three inputs of size 10 each, with a single value as output. 我正在尝试使用带有pytorch的简单2层神经网络,仅输入3个大小为10的输入,输出单个值。 I have normalised inputs and lowered learning rate. 我有输入标准化和降低学习率。 It is my understanding that a two layer fully connected neural network should be able to trivially fit to this data 我的理解是,两层完全连接的神经网络应该能够轻松地适应这些数据

Features:

0.8138  1.2342  0.4419  0.8273  0.0728  2.4576  0.3800  0.0512  0.6872  0.5201
1.5666  1.3955  1.0436  0.1602  0.1688  0.2074  0.8810  0.9155  0.9641  1.3668
1.7091  0.9091  0.5058  0.6149  0.3669  0.1365  0.3442  0.9482  1.2550  1.6950
[torch.FloatTensor of size 3x10]


Targets
[124, 125, 122]
[torch.FloatTensor of size 3]

The code is adapted from a simple example and I am using MSELoss as the loss function. 代码改编自一个简单的例子,我使用MSELoss作为损失函数。 The loss diverges to infinity after just a few iterations: 经过几次迭代后,损失会发散到无穷大:

features = torch.from_numpy(np.array(features))

x_data = Variable(torch.Tensor(features))
y_data = Variable(torch.Tensor(targets))

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(10,5)
        self.linear2 = torch.nn.Linear(5,1)

    def forward(self, x):
        l_out1 = self.linear(x)
        y_pred = self.linear2(l_out1)
        return y_pred

model = Model()

criterion = torch.nn.MSELoss(size_average = False)
optim = torch.optim.SGD(model.parameters(), lr = 0.001)

def main():
    for iteration in range(1000):
        y_pred = model(x_data)
        loss = criterion(y_pred, y_data)

        print(iteration, loss.data[0])
        optim.zero_grad()

        loss.backward()
        optim.step()

Any help would be appreciated. 任何帮助,将不胜感激。 Thanks 谢谢

EDIT: 编辑:

Indeed it seems that this was simply due to the learning rate being too high. 事实上,这似乎只是因为learning rate过高。 Setting to 0.00001 fixes convergence issues, albeit giving very slow convergence. 设置为0.00001修复了收敛问题,尽管收敛速度很慢。

Maybe you can try to predict a log(y) instead of y to improve the convergence even more. 也许您可以尝试预测日志(y)而不是y来进一步提高收敛性。 Also Adam optimizer (adaptive learning rate) should help + BatchNormalization (for example between your linear layers). 此外,Adam优化器(自适应学习速率)应该有助于+ BatchNormalization(例如在线性层之间)。

This is because you're not using a non-linearity between layers, and your network is still Linear. 这是因为您没有在图层之间使用非线性,并且您的网络仍然是线性的。

You can use Relu in order to make it non linear. 您可以使用Relu使其非线性。 You can change the forward method like this : 您可以像这样更改正向方法:

...
y_pred = torch.nn.functional.F.relu(self.linear2(l_out1))
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM