Pytorch and Polynomial Linear Regression issue

Question

I have modified the code hat I found on the Pytorch github to suit my data, but my loss results are huge and with each iteration they get bigger and later become nan.Code doesn't give me any errors, just nor loss results and no predictions. I have another code that deals withe the simple Linear Regression and all works fine. I guess I'm missing something simple here, but I'm unable to see it. any help would be appreciated.

Code:

import sklearn.linear_model as lm
from sklearn.preprocessing import PolynomialFeatures
import torch
import torch.autograd
import torch.nn.functional as F
from torch.autograd import Variable


train_data = torch.Tensor([
   [40,  6,  4],
   [44, 10,  4],
   [46, 12,  5],
   [48, 14,  7],
   [52, 16,  9],
   [58, 18, 12],
   [60, 22, 14],
   [68, 24, 20],
   [74, 26, 21],
   [80, 32, 24]])
test_data = torch.Tensor([
    [6, 4],
    [10, 5],
    [4, 8]])

x_train = train_data[:,1:3]
y_train = train_data[:,0]

POLY_DEGREE = 3
input_size = 2
output_size = 1

poly = PolynomialFeatures(input_size * POLY_DEGREE, include_bias=False)
x_train_poly = poly.fit_transform(x_train.numpy())


class Model(torch.nn.Module):

    def __init__(self):
        super(Model, self).__init__()
        self.fc = torch.nn.Linear(poly.n_output_features_, output_size)

    def forward(self, x):
        return self.fc(x)

model = Model()    
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

losses = []

for i in range(10):
    optimizer.zero_grad()
    outputs = model(Variable(torch.Tensor(x_train_poly)))
    print(outputs)
    loss = criterion(outputs, Variable(y_train))
    print(loss.data[0])
    losses.append(loss.data[0])
    loss.backward()    
    optimizer.step()
    if loss.data[0] < 1e-4:
        break    

print('n_iter', i)
print(loss.data[0])
plt.plot(losses)
plt.show()

output:

[393494300459008.0, inf, inf, inf, nan, nan, nan, nan, nan, nan]

n_iter

9 nan

Answer 1

There are a couple of things that contribute to the problem. Changing some or all of them will give you reasonable results and make learning possible.

Some of your (polynomial) features have a huge variance and are taking on very large values. Check out np.max(x_train_poly) .When your weight matrix is randomly initialised, this causes the initial predictions to be largely off, and the loss to approach infinity quickly. To counteract this, you may want to standardise your features first (ie make mean 0 and variance 1 for each feature). Note, that in very deep networks a similar idea is used called "Batch Normalization". If you're interested, you can read more here: https://arxiv.org/abs/1502.03167 You can do the following to fix your example:
```
 means = np.mean(x_train_poly,axis=0,keepdims=True) std = np.std(x_train_poly,axis=0,keepdims=True) x_train_poly = (x_train_poly - means) / std 
```
Your current model, doesn't have any hidden layers, which is sort of the point of a neural network and building a non-linear regressor/ classifier. What you're doing right now is applying a linear transformation to the 27 input features to get something that is close to the output. You could add an additional layer like this:
```
 hidden_dim = 50 class Model(torch.nn.Module): def __init__(self): super(Model, self).__init__() self.layer1 = torch.nn.Linear(poly.n_output_features_, hidden_dim) self.layer2 = torch.nn.Linear(hidden_dim, output_size) def forward(self, x): return self.layer2(torch.nn.ReLU()(self.layer1(x))) 
```
Note that I have added a non-linearity after the first linear transformation, because else there's no point having multiple layers.
The problem of initial predictions that are greatly off in the beginning and lead to the loss approaching infinity. You're using squared loss which essentially doubles the order of magnitude of your initial "mistake" in the loss function. And once the loss is infinity, you'll be unable to escape, because the gradient updates are essentially also infinity as you're using squared loss. An easy fix that is sometimes useful is to use the smooth L1 loss instead. Essentially MSE on the interval [0, 1] and L1 loss outside that interval. Change the following:
```
 criterion = torch.nn.SmoothL1Loss() 
```
That already gets you to something sensible (ie no infs anymore), but now consider tuning the learning rate and introducing weight_decay. You may also want to change the optimizer. Some suggestions that works alright:
```
 optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=1) optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=0.1) 
```

Pytorch and Polynomial Linear Regression issue

Question

1 answers

solution1
1 2017-06-10 15:26:37

Pytorch and Polynomial Linear Regression issue

Question

1 answers

solution1 1 2017-06-10 15:26:37

solution1
1 2017-06-10 15:26:37