簡體   English   中英

在 Python 中獲取回歸模型的 beta 系數

[英]get beta coefficients of regression model in Python

dataset = pd.read_excel('dfmodel.xlsx')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

from sklearn.metrics import r2_score
print('The R2 score of Multi-Linear Regression model is: ',r2_score(y_test,y_pred))

使用上面的代碼,我設法進行了線性回歸並獲得了 R2。 如何獲得每個預測變量的 beta 系數?

sklearn.linear_model.LinearRegression文檔頁面,您可以在找到系數(斜率)和截距regressor.coef_regressor.intercept_分別。

如果您在擬合模型之前使用sklearn.preprocessing.StandardScaler ,那么回歸系數應該是您正在尋找的 Beta 系數。

就個人而言,我更喜歡指定 1 度的 np.polyfit() 單步。

import numpy as np
np.polyfit(X,y,1)[0]  #returns beta + other coeffs if > 1 degree.

所以你的問題,如果我理解,你希望根據初始 y 計算預測的 y 值 - 將是這樣的:

np.polyfit(y_test,y_pred,1)[0]

不過,我會測試 np.polyfit(x_test,y_pred)[0] 。

使用regressor.coef_ 通過與statsmodels實現進行比較,您可以看到這些系數如何按預測變量的順序映射:

from sklearn.linear_model import LinearRegression

regressor = LinearRegression(fit_intercept=False)
regressor.fit(X, y)

regressor.coef_ 
# array([0.43160901, 0.42441214])

statsmodels版本:

import statsmodels.api as sm

sm.add_constant(X)
mod = sm.OLS(y, X)
res = mod.fit()

print(res.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:                      y   R-squared (uncentered):                   0.624
Model:                            OLS   Adj. R-squared (uncentered):              0.623
Method:                 Least Squares   F-statistic:                              414.0
Date:                Tue, 29 Sep 2020   Prob (F-statistic):                   1.25e-106
Time:                        17:03:27   Log-Likelihood:                         -192.54
No. Observations:                 500   AIC:                                      389.1
Df Residuals:                     498   BIC:                                      397.5
Df Model:                           2                                                  
Covariance Type:            nonrobust                                                  
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.4316      0.041     10.484      0.000       0.351       0.512
x2             0.4244      0.041     10.407      0.000       0.344       0.505
==============================================================================
Omnibus:                       36.830   Durbin-Watson:                   1.967
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               13.197
Skew:                           0.059   Prob(JB):                      0.00136
Kurtosis:                       2.213   Cond. No.                         2.57
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

您可以使用以下方法進行直接等效性測試:

np.array([regressor.coef_.round(8) == res.params.round(8)]).all() # True

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM