简体   繁体   中英

Least Squares method in practice

Very simple regression task. I have three variables x1, x2, x3 with some random noise. And I know target equation: y = q1*x1 + q2*x2 + q3*x3 . Now I want to find target coefs: q1, q2, q3 evaluate the performance using the mean Relative Squared Error (RSE) (Prediction/Real - 1)^2 to evaluate the performance of our prediction methods.

In the research, I see that this is ordinary Least Squares Problem . But I can't get from examples on the internet how to solve this particular problem in Python. Let say I have data:

import numpy as np

sourceData = np.random.rand(1000, 3)
koefs = np.array([1, 2, 3])
target = np.dot(sourceData, koefs)

(In real life that data are noisy, with not normal distribution.) How to find this koefs using Least Squares approach in python? Any lib usage.

@ayhan made a valuable comment.

And there is a problem with your code: Actually there is no noise in the data you collect. The input data is noisy, but after the multiplication, you don't add any additional noise.

I've added some noise to your measurements and used the least squares formula to fit the parameters, here's my code:

data = np.random.rand(1000,3)

true_theta = np.array([1,2,3])
true_measurements = np.dot(data, true_theta)

noise = np.random.rand(1000) * 1

noisy_measurements = true_measurements + noise

estimated_theta = np.linalg.inv(data.T @ data) @ data.T @ noisy_measurements

The estimated_theta will be close to true_theta . If you don't add noise to the measurements, they will be equal.

I've used the python3 matrix multiplication syntax. You could use np.dot instead of @

That makes the code longer, so I've split the formula:

MTM_inv = np.linalg.inv(np.dot(data.T, data))
MTy = np.dot(data.T, noisy_measurements)
estimated_theta = np.dot(MTM_inv, MTy)

You can read up on least squares here: https://en.wikipedia.org/wiki/Linear_least_squares_(mathematics)#The_general_problem

UPDATE:

Or you could just use the builtin least squares function:

np.linalg.lstsq(data, noisy_measurements)

In addition to the @lhk answer I have found great scipy Least Squares function . It is easy to get the requested behavior with it.

This way we can provide a custom function that returns residuals and form Relative Squared Error instead of absolute squared difference:

import numpy as np
from scipy.optimize import least_squares
data = np.random.rand(1000,3)

true_theta = np.array([1,2,3])
true_measurements = np.dot(data, true_theta)

noise = np.random.rand(1000) * 1

noisy_measurements = true_measurements + noise
#noisy_measurements[-1] = data[-1]  @ (1000 * true_theta) - uncoment this outliner to see how much Relative Squared Error esimator works better then default abs diff for this case.


def my_func(params, x, y):
     res = (x @ params) / y - 1 # If we change this line to: (x @ params) - y - we will got the same result as np.linalg.lstsq
     return res

res = least_squares(my_func, x0,  args=(data, noisy_measurements) ) 
estimated_theta = res.x

Also, we can provide custom loss with loss argument function that will process the residuals and form final loss.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM