如何在 Statsmodels 中獲得穩健回歸 (RLM) 的 R 平方？

Question

在測量擬合優度時 - R-Squared 似乎是“簡單”線性模型的普遍理解（和接受）度量。 但是在statsmodels （以及其他統計軟件）的情況下， RLM不包括 R 平方和回歸結果。 有沒有辦法讓它“手動”計算，也許類似於在Stata 中的計算方式？

或者是否有另一種可以從sm.RLS產生的結果中使用/計算的sm.RLS ？

這是 Statsmodels 正在生成的內容：

import numpy as np
import statsmodels.api as sm

# Sample Data with outliers
nsample = 50
x = np.linspace(0, 20, nsample)
x = sm.add_constant(x)
sig = 0.3
beta = [5, 0.5]
y_true = np.dot(x, beta)
y = y_true + sig * 1. * np.random.normal(size=nsample)
y[[39,41,43,45,48]] -= 5   # add some outliers (10% of nsample)

# Regression with Robust Linear Model
res = sm.RLM(y, x).fit()
print(res.summary())

哪些輸出：

                    Robust linear Model Regression Results                    
==============================================================================
Dep. Variable:                      y   No. Observations:                   50
Model:                            RLM   Df Residuals:                       48
Method:                          IRLS   Df Model:                            1
Norm:                          HuberT                                         
Scale Est.:                       mad                                         
Cov Type:                          H1                                         
Date:                 Mo, 27 Jul 2015                                         
Time:                        10:00:00                                         
No. Iterations:                    17                                         
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          5.0254      0.091     55.017      0.000         4.846     5.204
x1             0.4845      0.008     61.555      0.000         0.469     0.500
==============================================================================

Answer 1

由於 OLS 返回 R2，我建議使用簡單的線性回歸將實際值與擬合值進行回歸。 無論擬合值來自何處，這種方法都會為您提供相應 R2 的指示。

Answer 2

為什么不使用 model.predict 來獲得r2 ？ 例如：

r2=1. - np.sum(np.abs(model.predict(X) - y) **2) / np.sum(np.abs(y - np.mean(y)) ** 2)

Answer 3

R2 不能很好地衡量 RLM 模型的擬合優度。 問題是離群值對 R2 值有巨大影響，以至於它完全由離群值決定。 之后使用加權回歸是一種有吸引力的替代方法，但最好查看估計系數的 p 值、標准誤差和置信區間。

將 OLS 摘要與 RLM 進行比較（由於隨機化不同，結果與您的略有不同）：

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.726
Model:                            OLS   Adj. R-squared:                  0.721
Method:                 Least Squares   F-statistic:                     127.4
Date:                Wed, 03 Nov 2021   Prob (F-statistic):           4.15e-15
Time:                        09:33:40   Log-Likelihood:                -87.455
No. Observations:                  50   AIC:                             178.9
Df Residuals:                      48   BIC:                             182.7
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          5.7071      0.396     14.425      0.000       4.912       6.503
x1             0.3848      0.034     11.288      0.000       0.316       0.453
==============================================================================
Omnibus:                       23.499   Durbin-Watson:                   2.752
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               33.906
Skew:                          -1.649   Prob(JB):                     4.34e-08
Kurtosis:                       5.324   Cond. No.                         23.0
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

                    Robust linear Model Regression Results                    
==============================================================================
Dep. Variable:                      y   No. Observations:                   50
Model:                            RLM   Df Residuals:                       48
Method:                          IRLS   Df Model:                            1
Norm:                          HuberT                                         
Scale Est.:                       mad                                         
Cov Type:                          H1                                         
Date:                Wed, 03 Nov 2021                                         
Time:                        09:34:24                                         
No. Iterations:                    17                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          5.1857      0.111     46.590      0.000       4.968       5.404
x1             0.4790      0.010     49.947      0.000       0.460       0.498
==============================================================================

If the model instance has been used for another fit with different fit parameters, then the fit options might not be the correct ones anymore .

您可以看到，從 OLS 到 RLM，截距和斜率項的標准誤差和置信區間的大小都在減小。 這表明估計值更接近真實值。

如何在 Statsmodels 中獲得穩健回歸 (RLM) 的 R 平方？

問題描述

3 個解決方案

解決方案1
0 2019-03-19 21:36:04

解決方案2
0 2020-01-04 05:39:27

解決方案3
0 2021-11-03 09:35:59

如何在 Statsmodels 中獲得穩健回歸 (RLM) 的 R 平方？

問題描述

3 個解決方案

解決方案1 0 2019-03-19 21:36:04

解決方案2 0 2020-01-04 05:39:27

解決方案3 0 2021-11-03 09:35:59

解決方案1
0 2019-03-19 21:36:04

解決方案2
0 2020-01-04 05:39:27

解決方案3
0 2021-11-03 09:35:59