在 Python 中獲取回歸模型的 beta 系數

Question

dataset = pd.read_excel('dfmodel.xlsx')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

from sklearn.metrics import r2_score
print('The R2 score of Multi-Linear Regression model is: ',r2_score(y_test,y_pred))

使用上面的代碼，我設法進行了線性回歸並獲得了 R2。 如何獲得每個預測變量的 beta 系數？

Answer 1

從sklearn.linear_model.LinearRegression文檔頁面，您可以在找到系數（斜率）和截距regressor.coef_和regressor.intercept_分別。

如果您在擬合模型之前使用sklearn.preprocessing.StandardScaler ，那么回歸系數應該是您正在尋找的 Beta 系數。

Answer 2

就個人而言，我更喜歡指定 1 度的 np.polyfit() 單步。

import numpy as np
np.polyfit(X,y,1)[0]  #returns beta + other coeffs if > 1 degree.

所以你的問題，如果我理解，你希望根據初始 y 計算預測的 y 值 - 將是這樣的：

np.polyfit(y_test,y_pred,1)[0]

不過，我會測試 np.polyfit(x_test,y_pred)[0] 。

Answer 3

使用regressor.coef_ 。 通過與statsmodels實現進行比較，您可以看到這些系數如何按預測變量的順序映射：

from sklearn.linear_model import LinearRegression

regressor = LinearRegression(fit_intercept=False)
regressor.fit(X, y)

regressor.coef_ 
# array([0.43160901, 0.42441214])

statsmodels版本：

import statsmodels.api as sm

sm.add_constant(X)
mod = sm.OLS(y, X)
res = mod.fit()

print(res.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.624
Model:                            OLS   Adj. R-squared (uncentered):              0.623
Method:                 Least Squares   F-statistic:                              414.0
Date:                Tue, 29 Sep 2020   Prob (F-statistic):                   1.25e-106
Time:                        17:03:27   Log-Likelihood:                         -192.54
No. Observations:                 500   AIC:                                      389.1
Df Residuals:                     498   BIC:                                      397.5
Df Model:                           2                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.4316      0.041     10.484      0.000       0.351       0.512
x2             0.4244      0.041     10.407      0.000       0.344       0.505
==============================================================================
Omnibus:                       36.830   Durbin-Watson:                   1.967
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               13.197
Skew:                           0.059   Prob(JB):                      0.00136
Kurtosis:                       2.213   Cond. No.                         2.57
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

您可以使用以下方法進行直接等效性測試：

np.array([regressor.coef_.round(8) == res.params.round(8)]).all() # True

在 Python 中獲取回歸模型的 beta 系數

問題描述

3 個解決方案

解決方案1
0 2020-09-29 23:53:21

解決方案2
0 2021-02-24 13:54:39

解決方案3
-1 已采納 2020-09-30 00:06:30

在 Python 中獲取回歸模型的 beta 系數

問題描述

3 個解決方案

解決方案1 0 2020-09-29 23:53:21

解決方案2 0 2021-02-24 13:54:39

解決方案3 -1 已采納 2020-09-30 00:06:30

解決方案1
0 2020-09-29 23:53:21

解決方案2
0 2021-02-24 13:54:39

解決方案3
-1 已采納 2020-09-30 00:06:30