简体   繁体   English

如何使用 statsmodels 获得未缩放的回归系数误差?

[英]How to get the unscaled regression coefficients errors using statsmodels?

I'm trying to compute the coefficient errors of a regression using statsmodels.我正在尝试使用 statsmodels 计算回归的系数误差。 Also known as the standard errors of the parameter estimates.也称为参数估计值的标准误差。 But I need to compute their "unscaled" version.但我需要计算他们的“未缩放”版本。 I've only managed to do so with NumPy.我只设法用 NumPy 做到了。

You can see the meaning of "unscaled" in the docs: https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html您可以在文档中看到“未缩放”的含义: https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html

cov bool or str, optional

    If given and not False, return not just the estimate but also its covariance matrix.
    By default, the covariance are scaled by chi2/dof, where dof = M - (deg + 1),
    i.e., the weights are presumed to be unreliable except in a relative sense and
    everything is scaled such that the reduced chi2 is unity. This scaling is omitted
    if cov='unscaled', as is relevant for the case that the weights are w = 1/sigma, with
    sigma known to be a reliable estimate of the uncertainty.

I'm using this data to run the rest of the code in this post:我正在使用此数据运行本文中代码的 rest:

import numpy as np
x = np.array([-0.841, -0.399, 0.599, 0.203, 0.527, 0.129, 0.703, 0.503])
y = np.array([1.01, 1.24, 1.09, 0.95, 1.02, 0.97, 1.01, 0.98])
sigmas = np.array([6872.26, 80.71, 47.97, 699.94, 57.55, 1561.54, 311.98, 501.08])
# The convention for weights are different 
sm_weights = np.array([1.0/sigma**2 for sigma in sigmas])
np_weights = np.array([1.0/sigma for sigma in sigmas])

With NumPy:使用 NumPy:

coefficients, cov = np.polyfit(x, y, deg=2, w=np_weights, cov='unscaled')
# The errors I need to get
print(np.sqrt(np.diag(cov))) # [917.57938013 191.2100413  211.29028248]

If I compute the regression using statsmodels:如果我使用 statsmodels 计算回归:

from sklearn.preprocessing import PolynomialFeatures
import statsmodels.api as smapi

polynomial_features = PolynomialFeatures(degree=2)
polynomial = polynomial_features.fit_transform(x.reshape(-1, 1))
model = smapi.WLS(y, polynomial, weights=sm_weights)
regression = model.fit()

# Get coefficient errors
# Notice the [::-1], statsmodels returns the coefficients in the reverse order NumPy does
print(regression.bse[::-1]) # [0.24532856, 0.05112286, 0.05649161]

So the values I get are different, but related:所以我得到的值是不同的,但相关:

np_errors = np.sqrt(np.diag(cov))
sm_errors = regression.bse[::-1]
print(np_errors / sm_errors) # [3740.2061481, 3740.2061481, 3740.2061481]

The NumPy documentation says the covariance are scaled by chi2/dof where dof = M - (deg + 1) . NumPy 文档说the covariance are scaled by chi2/dof where dof = M - (deg + 1) So I tried the following:所以我尝试了以下方法:

degree = 2
model_predictions = np.polyval(coefficients, x)
residuals = (model_predictions - y)
chi_squared = np.sum(residuals**2)
degrees_of_freedom = len(x) - (degree + 1)
scale_factor = chi_squared / degrees_of_freedom

sm_cov = regression.cov_params()
unscaled_errors = np.sqrt(np.diag(sm_cov * scale_factor))[::-1] # [0.09848423, 0.02052266, 0.02267789]
unscaled_errors = np.sqrt(np.diag(sm_cov / scale_factor))[::-1] # [0.61112427, 0.12734931, 0.14072311]

What I notice is that the covariance matrix I get from NumPy is much larger than the one I get from statsmodels:我注意到我从 NumPy 得到的协方差矩阵比我从 statsmodels 得到的协方差矩阵大得多:

>>> cov
array([[ 841951.9188366 , -154385.61049538, -188456.18957375],
       [-154385.61049538,   36561.27989418,   31208.76422516],
       [-188456.18957375,   31208.76422516,   44643.58346933]])
>>> regression.cov_params()
array([[ 0.0031913 ,  0.00223093, -0.0134716 ],
       [ 0.00223093,  0.00261355, -0.0110361 ],
       [-0.0134716 , -0.0110361 ,  0.0601861 ]])

As long as I can't make them equivalent, I won't be able to get the same errors.只要我不能使它们相等,我就不会得到相同的错误。 Any idea of what the difference in scale could mean and how to make both covariance matrices equal?知道尺度上的差异意味着什么以及如何使两个协方差矩阵相等吗?

statsmodels documentation is not well organized in some parts. statsmodels 文档的某些部分组织得不好。 Here is a notebook with an example for the following https://www.statsmodels.org/devel/examples/notebooks/generated/chi2_fitting.html这是一个带有以下示例的笔记本https://www.statsmodels.org/devel/examples/notebooks/generated/chi2_fitting.html

The regression models in statsmodels like OLS and WLS, have an option to keep the scale fixed. OLS 和 WLS 等统计模型中的回归模型可以选择保持scale固定。 This is the equivalent to cov="unscaled" in numpy and scipy. The statsmodels option is more general, because it allows fixing the scale at any user defined value.这相当于 numpy 和 scipy 中的cov="unscaled" 。statsmodels 选项更通用,因为它允许将比例固定为任何用户定义的值。

https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLSResults.get_robustcov_results.html https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLSResults.get_robustcov_results.html

We we have a model as defined in the example, either OLS or WLS, then using我们有一个如示例中定义的 model,OLS 或 WLS,然后使用

regression = model.fit(cov_type="fixed scale")

will keep the scale at 1 and the resulting covariance matrix is unscaled.将保持比例为 1,并且生成的协方差矩阵未缩放。

Using使用

regression = model.fit(cov_type="fixed scale", cov_kwds={"scale": 2})

will keep the scale fixed at value two.将使比例固定在值二。

(some links to related discussion motivation are in https://github.com/statsmodels/statsmodels/pull/2137 ) (相关讨论动机的一些链接在https://github.com/statsmodels/statsmodels/pull/2137

Caution警告

The fixed scale cov_type will be used for inferential statistic that are based on the covariance of the parameter estimates, cov_params .固定尺度 cov_type 将用于基于参数估计值cov_params协方差的推论统计。 This affects standard errors, t-tests, wald tests and confidence and prediction intervals.这会影响标准误差、t 检验、wald 检验以及置信区间和预测区间。

However, some other results statistics might not be adjusted to use the fixed scale instead of the estimated scale, eg resid_pearson .但是,某些其他结果统计可能不会调整为使用固定比例而不是估计比例,例如resid_pearson

https://github.com/statsmodels/statsmodels/issues/8190 https://github.com/statsmodels/statsmodels/issues/8190

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM