简体   繁体   中英

What's the easiest way to calculate regression coefficient in python?

I have a 100 by 1 response variable Y, and a 100 by 3 predictor matrix X. I want to calcualte the regression coefficient (X'X)^{-1}X'Y.

Currently I'm doing it as follows:

invXpX=inv(np.dot(np.transpose(X),X))
XpY=np.dot(np.transpose(X),Y)
betahat=np.dot(invXpX,XpY)

This looks pretty cumbersome, while in MATLAB we could do it just like the original math formula: inv(X'*X)*X'*Y. Is there an easier way to calculate this regression coefficient in python? Thanks!

Yes it can be written more compact, but note that this will not always improve your code, or the readability.

The transpose of a numpy array can be found using dot T ( .T ). If you use numpy matrix instead of numpy arrays you can also use .I for the inverse, but I would recommend you to use ndarray. For the dot product you can use @ . Thereby np.dot(X,Y) = X.dot(Y) when X and Y are numpy arrays.

import numpy as np
# Simulate data using a quadratic equation with coefficients y=ax^2+bx+c 
a, b, c = 1, 2, 3
x = np.arange(100)
# Add random component to y values for estimation
y = a*x**2 + b*x + c + np.random.randn(100)
# Get X matrix [100x3]
X = np.vstack([x**2, x, np.ones(x.shape)]).T

# Estimate coefficients a, b, c
x_hat = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
>>> array([0.99998334, 2.00246583, 2.95697339])
x_hat = np.linalg.inv(X.T@(X))@(X.T)@(y)
>>> array([0.99998334, 2.00246583, 2.95697339])

# Use matrix:
X_mat = np.matrix(X)
x_hat = (X_mat.T@X_mat).I@X_mat.T@y
>>> matrix([[0.99998334, 2.00246583, 2.95697339]])

# without noise:
y = a*x**2 + b*x + c 
x_hat = (X_mat.T@X_mat).I@X_mat.T@y
>>> matrix([[1., 2., 3.]])

You can try this:

np.invert(X.T @ X) @ (X.T @ Y)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM