PyTorch Linear Regression Issue

Question

I am trying to implement a simple linear model in PyTorch that can be given x data and y data, and then trained to recognize the equation y = mx + b. However, whenever I try to test my model after training, it thinks that the equation is y= mx + 2b. I'll show my code, and hopefully someone will be able to spot an issue. Thank you in advance for any help.

import torch

D_in = 500
D_out = 500
batch=200
model=torch.nn.Sequential(
     torch.nn.Linear(D_in,D_out),
)

Next I create some data and set a rule. Let's do 3x+4.

x_data=torch.rand(batch,D_in)
y_data=torch.randn(batch,D_out)

for i in range(batch):
    for j in range(D_in):
         y_data[i][j]=3*x_data[i][j]+5 # model thinks y=mx+c -> y=mx+2c?

loss_fn=torch.nn.MSELoss(size_average=False)
optimizer=torch.optim.Adam(model.parameters(),lr=0.001)

Now to training...

for epoch in range(500):
    y_pred=model(x_data)
    loss=loss_fn(y_pred,y_data)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Then I test my model with a Tensor/matrix of just 1's.

test_data=torch.ones(batch,D_in) 
y_pred=model(test_data)

Now, I'd expect to get 3*1 + 4 = 7, but instead, my model thinks it is 11.

[[ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    ...,
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516],
    [ 10.7286,  11.0499,  10.9448,  ...,  11.0812,  10.9387,
      10.7516]])

Similarly, if I change the rule to y=3x+8, my model guesses 19. So, I am not sure what is going on. Why is the constant being added twice? By the way, if I just set the rule to y=3x, my model correctly infers 3, and for y=mx in general my model correctly infers m. For some reason, the constant term is throwing it off. Any help to solve this problem is much appreciated. Thanks!

Answer 1

Your network does not learn long enough. It gets a vector with 500 features to describe a single datum.

Your network has to map the big input of 500 features to an output including 500 values. Your trainingdata is randomly created, not like your simple example, so I think you just have to train longer to fit your weights to approximate this function from R^500 to R^500.

If I reduce the input and output dimensionality and increase the batch size, learning rate and training steps I get the expected result:

import torch

D_in = 100
D_out = 100
batch = 512

model=torch.nn.Sequential(
     torch.nn.Linear(D_in,D_out),
)

x_data=torch.rand(batch,D_in)
y_data=torch.randn(batch,D_out)
for i in range(batch):
    for j in range(D_in):
         y_data[i][j]=3*x_data[i][j]+4 # model thinks y=mx+c -> y=mx+2c?

loss_fn=torch.nn.MSELoss(size_average=False)
optimizer=torch.optim.Adam(model.parameters(),lr=0.01)

for epoch in range(10000):
    y_pred=model(x_data)
    loss=loss_fn(y_pred,y_data)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

test_data=torch.ones(batch,D_in)
y_pred=model(test_data)
print(y_pred)

If you just want to approximate f(x) = 3x + 4 with only one input you could also set D_in and D_out to 1.

PyTorch Linear Regression Issue

Question

1 answers

solution1
2 ACCPTED 2018-07-05 20:38:15

PyTorch Linear Regression Issue

Question

1 answers

solution1 2 ACCPTED 2018-07-05 20:38:15

solution1
2 ACCPTED 2018-07-05 20:38:15