简体   繁体   English

pytorch和多项式线性回归问题

[英]Pytorch and Polynomial Linear Regression issue

I have modified the code hat I found on the Pytorch github to suit my data, but my loss results are huge and with each iteration they get bigger and later become nan.Code doesn't give me any errors, just nor loss results and no predictions. 我已经修改了在Pytorch github上找到的代码帽以适合我的数据,但是我的损失结果很大,每次迭代它们都会变得更大,后来变成nan.Code不会给我任何错误,公正也不会给我损失结果,并且预测。 I have another code that deals withe the simple Linear Regression and all works fine. 我还有另一个处理简单线性回归的代码,并且一切正常。 I guess I'm missing something simple here, but I'm unable to see it. 我想我这里缺少一些简单的东西,但是我看不到它。 any help would be appreciated. 任何帮助,将不胜感激。

Code: 码:

import sklearn.linear_model as lm
from sklearn.preprocessing import PolynomialFeatures
import torch
import torch.autograd
import torch.nn.functional as F
from torch.autograd import Variable


train_data = torch.Tensor([
   [40,  6,  4],
   [44, 10,  4],
   [46, 12,  5],
   [48, 14,  7],
   [52, 16,  9],
   [58, 18, 12],
   [60, 22, 14],
   [68, 24, 20],
   [74, 26, 21],
   [80, 32, 24]])
test_data = torch.Tensor([
    [6, 4],
    [10, 5],
    [4, 8]])

x_train = train_data[:,1:3]
y_train = train_data[:,0]

POLY_DEGREE = 3
input_size = 2
output_size = 1

poly = PolynomialFeatures(input_size * POLY_DEGREE, include_bias=False)
x_train_poly = poly.fit_transform(x_train.numpy())


class Model(torch.nn.Module):

    def __init__(self):
        super(Model, self).__init__()
        self.fc = torch.nn.Linear(poly.n_output_features_, output_size)

    def forward(self, x):
        return self.fc(x)

model = Model()    
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

losses = []

for i in range(10):
    optimizer.zero_grad()
    outputs = model(Variable(torch.Tensor(x_train_poly)))
    print(outputs)
    loss = criterion(outputs, Variable(y_train))
    print(loss.data[0])
    losses.append(loss.data[0])
    loss.backward()    
    optimizer.step()
    if loss.data[0] < 1e-4:
        break    

print('n_iter', i)
print(loss.data[0])
plt.plot(losses)
plt.show()

output: 输出:

[393494300459008.0, inf, inf, inf, nan, nan, nan, nan, nan, nan] [393494300459008.0,inf,inf,inf,nan,nan,nan,nan,nan,nan]

n_iter

9 nan 9南

There are a couple of things that contribute to the problem. 有几件事导致了这个问题。 Changing some or all of them will give you reasonable results and make learning possible. 更改其中的一些或全部将为您提供合理的结果,并使学习成为可能。

  1. Some of your (polynomial) features have a huge variance and are taking on very large values. 您的某些(多项式)特征差异很大,并且具有很大的值。 Check out np.max(x_train_poly) .When your weight matrix is randomly initialised, this causes the initial predictions to be largely off, and the loss to approach infinity quickly. 检查np.max(x_train_poly) 。当您的权重矩阵被随机初始化时,这会导致初始预测大大偏离,并且损失很快接近无穷大。 To counteract this, you may want to standardise your features first (ie make mean 0 and variance 1 for each feature). 为了解决这个问题,您可能需要先对特征进行标准化(即,将每个特征的均值设为0,将方差设为1)。 Note, that in very deep networks a similar idea is used called "Batch Normalization". 注意,在非常深的网络中,使用了类似的概念,称为“批处理规范化”。 If you're interested, you can read more here: https://arxiv.org/abs/1502.03167 You can do the following to fix your example: 如果您有兴趣,可以在这里阅读更多内容: https : //arxiv.org/abs/1502.03167您可以执行以下操作来修复示例:

     means = np.mean(x_train_poly,axis=0,keepdims=True) std = np.std(x_train_poly,axis=0,keepdims=True) x_train_poly = (x_train_poly - means) / std 
  2. Your current model, doesn't have any hidden layers, which is sort of the point of a neural network and building a non-linear regressor/ classifier. 您当前的模型没有任何隐藏层,这是神经网络的关键所在,它可以构建非线性回归器/分类器。 What you're doing right now is applying a linear transformation to the 27 input features to get something that is close to the output. 您现在正在做的是对27个输入要素应用线性变换,以获取接近输出的内容。 You could add an additional layer like this: 您可以添加一个附加层,如下所示:

     hidden_dim = 50 class Model(torch.nn.Module): def __init__(self): super(Model, self).__init__() self.layer1 = torch.nn.Linear(poly.n_output_features_, hidden_dim) self.layer2 = torch.nn.Linear(hidden_dim, output_size) def forward(self, x): return self.layer2(torch.nn.ReLU()(self.layer1(x))) 

    Note that I have added a non-linearity after the first linear transformation, because else there's no point having multiple layers. 请注意,我在第一个线性变换后添加了非线性,因为否则没有一点可以包含多个图层。

  3. The problem of initial predictions that are greatly off in the beginning and lead to the loss approaching infinity. 最初的预测问题在开始时就大打折扣,并导致损失逼近无穷大。 You're using squared loss which essentially doubles the order of magnitude of your initial "mistake" in the loss function. 您正在使用平方损失,这实际上使损失函数中初始“错误”的数量级增加了一倍。 And once the loss is infinity, you'll be unable to escape, because the gradient updates are essentially also infinity as you're using squared loss. 一旦损失为无穷大,您将无法逃脱,因为在您使用平方损失时,梯度更新本质上也是无穷大。 An easy fix that is sometimes useful is to use the smooth L1 loss instead. 一个有时有用的简单解决方法是改用平滑L1损失。 Essentially MSE on the interval [0, 1] and L1 loss outside that interval. 本质上,间隔[0,1]上的MSE以及该间隔之外的L1丢失。 Change the following: 更改以下内容:

     criterion = torch.nn.SmoothL1Loss() 
  4. That already gets you to something sensible (ie no infs anymore), but now consider tuning the learning rate and introducing weight_decay. 那已经使您有了一些明智的选择(即不再有infs了),但是现在考虑调整学习速率并引入weight_decay。 You may also want to change the optimizer. 您可能还想更改优化器。 Some suggestions that works alright: 一些可行的建议:

     optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=1) optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=0.1) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM