如何使用 statsmodels 获得未缩放的回归系数误差？

Question

I'm trying to compute the coefficient errors of a regression using statsmodels.我正在尝试使用 statsmodels 计算回归的系数误差。 Also known as the standard errors of the parameter estimates.也称为参数估计值的标准误差。 But I need to compute their "unscaled" version.但我需要计算他们的“未缩放”版本。 I've only managed to do so with NumPy.我只设法用 NumPy 做到了。

You can see the meaning of "unscaled" in the docs: https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html您可以在文档中看到“未缩放”的含义： https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html

cov bool or str, optional

    If given and not False, return not just the estimate but also its covariance matrix.
    By default, the covariance are scaled by chi2/dof, where dof = M - (deg + 1),
    i.e., the weights are presumed to be unreliable except in a relative sense and
    everything is scaled such that the reduced chi2 is unity. This scaling is omitted
    if cov='unscaled', as is relevant for the case that the weights are w = 1/sigma, with
    sigma known to be a reliable estimate of the uncertainty.

I'm using this data to run the rest of the code in this post:我正在使用此数据运行本文中代码的 rest：

import numpy as np
x = np.array([-0.841, -0.399, 0.599, 0.203, 0.527, 0.129, 0.703, 0.503])
y = np.array([1.01, 1.24, 1.09, 0.95, 1.02, 0.97, 1.01, 0.98])
sigmas = np.array([6872.26, 80.71, 47.97, 699.94, 57.55, 1561.54, 311.98, 501.08])
# The convention for weights are different 
sm_weights = np.array([1.0/sigma**2 for sigma in sigmas])
np_weights = np.array([1.0/sigma for sigma in sigmas])

With NumPy:使用 NumPy：

coefficients, cov = np.polyfit(x, y, deg=2, w=np_weights, cov='unscaled')
# The errors I need to get
print(np.sqrt(np.diag(cov))) # [917.57938013 191.2100413  211.29028248]

If I compute the regression using statsmodels:如果我使用 statsmodels 计算回归：

from sklearn.preprocessing import PolynomialFeatures
import statsmodels.api as smapi

polynomial_features = PolynomialFeatures(degree=2)
polynomial = polynomial_features.fit_transform(x.reshape(-1, 1))
model = smapi.WLS(y, polynomial, weights=sm_weights)
regression = model.fit()

# Get coefficient errors
# Notice the [::-1], statsmodels returns the coefficients in the reverse order NumPy does
print(regression.bse[::-1]) # [0.24532856, 0.05112286, 0.05649161]

So the values I get are different, but related:所以我得到的值是不同的，但相关：

np_errors = np.sqrt(np.diag(cov))
sm_errors = regression.bse[::-1]
print(np_errors / sm_errors) # [3740.2061481, 3740.2061481, 3740.2061481]

The NumPy documentation says the covariance are scaled by chi2/dof where dof = M - (deg + 1) . NumPy 文档说the covariance are scaled by chi2/dof where dof = M - (deg + 1) 。 So I tried the following:所以我尝试了以下方法：

degree = 2
model_predictions = np.polyval(coefficients, x)
residuals = (model_predictions - y)
chi_squared = np.sum(residuals**2)
degrees_of_freedom = len(x) - (degree + 1)
scale_factor = chi_squared / degrees_of_freedom

sm_cov = regression.cov_params()
unscaled_errors = np.sqrt(np.diag(sm_cov * scale_factor))[::-1] # [0.09848423, 0.02052266, 0.02267789]
unscaled_errors = np.sqrt(np.diag(sm_cov / scale_factor))[::-1] # [0.61112427, 0.12734931, 0.14072311]

What I notice is that the covariance matrix I get from NumPy is much larger than the one I get from statsmodels:我注意到我从 NumPy 得到的协方差矩阵比我从 statsmodels 得到的协方差矩阵大得多：

>>> cov
array([[ 841951.9188366 , -154385.61049538, -188456.18957375],
       [-154385.61049538,   36561.27989418,   31208.76422516],
       [-188456.18957375,   31208.76422516,   44643.58346933]])
>>> regression.cov_params()
array([[ 0.0031913 ,  0.00223093, -0.0134716 ],
       [ 0.00223093,  0.00261355, -0.0110361 ],
       [-0.0134716 , -0.0110361 ,  0.0601861 ]])

As long as I can't make them equivalent, I won't be able to get the same errors.只要我不能使它们相等，我就不会得到相同的错误。 Any idea of what the difference in scale could mean and how to make both covariance matrices equal?知道尺度上的差异意味着什么以及如何使两个协方差矩阵相等吗？

Answer 1

statsmodels documentation is not well organized in some parts. statsmodels 文档的某些部分组织得不好。 Here is a notebook with an example for the following https://www.statsmodels.org/devel/examples/notebooks/generated/chi2_fitting.html这是一个带有以下示例的笔记本https://www.statsmodels.org/devel/examples/notebooks/generated/chi2_fitting.html

The regression models in statsmodels like OLS and WLS, have an option to keep the scale fixed. OLS 和 WLS 等统计模型中的回归模型可以选择保持scale固定。 This is the equivalent to cov="unscaled" in numpy and scipy. The statsmodels option is more general, because it allows fixing the scale at any user defined value.这相当于 numpy 和 scipy 中的cov="unscaled" 。statsmodels 选项更通用，因为它允许将比例固定为任何用户定义的值。

https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLSResults.get_robustcov_results.html https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLSResults.get_robustcov_results.html

We we have a model as defined in the example, either OLS or WLS, then using我们有一个如示例中定义的 model，OLS 或 WLS，然后使用

regression = model.fit(cov_type="fixed scale")

will keep the scale at 1 and the resulting covariance matrix is unscaled.将保持比例为 1，并且生成的协方差矩阵未缩放。

Using使用

regression = model.fit(cov_type="fixed scale", cov_kwds={"scale": 2})

will keep the scale fixed at value two.将使比例固定在值二。

(some links to related discussion motivation are in https://github.com/statsmodels/statsmodels/pull/2137 ) （相关讨论动机的一些链接在https://github.com/statsmodels/statsmodels/pull/2137 ）

Caution警告

The fixed scale cov_type will be used for inferential statistic that are based on the covariance of the parameter estimates, cov_params .固定尺度 cov_type 将用于基于参数估计值cov_params协方差的推论统计。 This affects standard errors, t-tests, wald tests and confidence and prediction intervals.这会影响标准误差、t 检验、wald 检验以及置信区间和预测区间。

However, some other results statistics might not be adjusted to use the fixed scale instead of the estimated scale, eg resid_pearson .但是，某些其他结果统计可能不会调整为使用固定比例而不是估计比例，例如resid_pearson 。

https://github.com/statsmodels/statsmodels/issues/8190 https://github.com/statsmodels/statsmodels/issues/8190

如何使用 statsmodels 获得未缩放的回归系数误差？

问题描述

1 个解决方案

解决方案1
0 2022-03-24 16:04:58

如何使用 statsmodels 获得未缩放的回归系数误差？

问题描述

1 个解决方案

解决方案1 0 2022-03-24 16:04:58

解决方案1
0 2022-03-24 16:04:58