Least squares not working for a set of y's

Question

I am trying to run a least square algorithm using numpy and is having trouble. Can someone please tell me what I am doing wrong in the given code? When I set y to be y = np.power(X, 1) + np.random.rand(20)*3 or some other reasonable function of x, everything is working fine. But for that particular y defined by those given y values, the plot I am getting is senseless.

Is this some kind of numerical problem?

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

X = np.arange(1,21)
y = np.array([-0.00454712, -0.00457764, -0.0045166 , -0.00442505, -0.00427246,
       -0.00411987, -0.00378418, -0.003479  , -0.00314331, -0.00259399,
       -0.00213623, -0.00146484, -0.00082397, -0.00030518,  0.00027466,
        0.00076294,  0.00146484,  0.00192261,  0.00247192,  0.00314331])

#y = np.power(X, 1) + np.random.rand(20)*3

w = np.linalg.lstsq(X.reshape(20, 1), y)[0]

plt.plot(X, y, 'red')
plt.plot(X, X*w[0], 'blue')
plt.show()

Answer 1

Are you sure there is a linear relationship between what you are fitting and the y variable data?

Using the code ( y = np.power(X, 1) + np.random.rand(20)*3 ) from your example, you have a linear relationship built into the y variable itself (with some noise) which allows your plot to track relatively well with the linear equation.

X = np.arange(1,21)

#y = np.power(X, 1) + np.random.rand(20)*3

w = np.linalg.lstsq(X.reshape(20, 1), y)[0]

plt.plot(X, y, 'red')
plt.plot(X, X*w[0], 'blue')
plt.show()

However, when you alternate to something like your y variable

    y = np.array([-0.00454712, -0.00457764, -0.0045166 , -0.00442505, -0.00427246,
       -0.00411987, -0.00378418, -0.003479  , -0.00314331, -0.00259399,
       -0.00213623, -0.00146484, -0.00082397, -0.00030518,  0.00027466,
        0.00076294,  0.00146484,  0.00192261,  0.00247192,  0.00314331])

You end up with something less easy to fit.

Looking at the documentation , if you are attempting to something that fits this set of values, you will need to build in a constant component in which case lstsq does not do by default. The docs state for lstsq

Return the least-squares solution to a linear matrix equation.

Solves the equation ax = b

If you really want to fit the data to a linear equation, running code like the below will give you something that almost matches your original data. However, the data behind this process seems to have polynomial/exponential driver which would make polyfit better.

X = np.arange(1,21)
y = np.array([-0.00454712, -0.00457764, -0.0045166 , -0.00442505, -0.00427246,
       -0.00411987, -0.00378418, -0.003479  , -0.00314331, -0.00259399,
       -0.00213623, -0.00146484, -0.00082397, -0.00030518,  0.00027466,
        0.00076294,  0.00146484,  0.00192261,  0.00247192,  0.00314331])

#y = np.power(X, 1) + np.random.rand(20)*3
X2 = np.vstack([X, np.ones(len(X))]).T
w = np.linalg.lstsq(X2, y)[0]


plt.plot(X, y, 'red')
plt.plot(X, X.dot(w[0])+w[1], 'blue')
plt.show()

Least squares not working for a set of y's

Question

1 answers

solution1
1 ACCPTED 2017-10-13 18:48:54

Least squares not working for a set of y's

Question

1 answers

solution1 1 ACCPTED 2017-10-13 18:48:54

solution1
1 ACCPTED 2017-10-13 18:48:54