简体   繁体   中英

How to get the unscaled regression coefficients errors using statsmodels?

I'm trying to compute the coefficient errors of a regression using statsmodels. Also known as the standard errors of the parameter estimates. But I need to compute their "unscaled" version. I've only managed to do so with NumPy.

You can see the meaning of "unscaled" in the docs: https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html

cov bool or str, optional

    If given and not False, return not just the estimate but also its covariance matrix.
    By default, the covariance are scaled by chi2/dof, where dof = M - (deg + 1),
    i.e., the weights are presumed to be unreliable except in a relative sense and
    everything is scaled such that the reduced chi2 is unity. This scaling is omitted
    if cov='unscaled', as is relevant for the case that the weights are w = 1/sigma, with
    sigma known to be a reliable estimate of the uncertainty.

I'm using this data to run the rest of the code in this post:

import numpy as np
x = np.array([-0.841, -0.399, 0.599, 0.203, 0.527, 0.129, 0.703, 0.503])
y = np.array([1.01, 1.24, 1.09, 0.95, 1.02, 0.97, 1.01, 0.98])
sigmas = np.array([6872.26, 80.71, 47.97, 699.94, 57.55, 1561.54, 311.98, 501.08])
# The convention for weights are different 
sm_weights = np.array([1.0/sigma**2 for sigma in sigmas])
np_weights = np.array([1.0/sigma for sigma in sigmas])

With NumPy:

coefficients, cov = np.polyfit(x, y, deg=2, w=np_weights, cov='unscaled')
# The errors I need to get
print(np.sqrt(np.diag(cov))) # [917.57938013 191.2100413  211.29028248]

If I compute the regression using statsmodels:

from sklearn.preprocessing import PolynomialFeatures
import statsmodels.api as smapi

polynomial_features = PolynomialFeatures(degree=2)
polynomial = polynomial_features.fit_transform(x.reshape(-1, 1))
model = smapi.WLS(y, polynomial, weights=sm_weights)
regression = model.fit()

# Get coefficient errors
# Notice the [::-1], statsmodels returns the coefficients in the reverse order NumPy does
print(regression.bse[::-1]) # [0.24532856, 0.05112286, 0.05649161]

So the values I get are different, but related:

np_errors = np.sqrt(np.diag(cov))
sm_errors = regression.bse[::-1]
print(np_errors / sm_errors) # [3740.2061481, 3740.2061481, 3740.2061481]

The NumPy documentation says the covariance are scaled by chi2/dof where dof = M - (deg + 1) . So I tried the following:

degree = 2
model_predictions = np.polyval(coefficients, x)
residuals = (model_predictions - y)
chi_squared = np.sum(residuals**2)
degrees_of_freedom = len(x) - (degree + 1)
scale_factor = chi_squared / degrees_of_freedom

sm_cov = regression.cov_params()
unscaled_errors = np.sqrt(np.diag(sm_cov * scale_factor))[::-1] # [0.09848423, 0.02052266, 0.02267789]
unscaled_errors = np.sqrt(np.diag(sm_cov / scale_factor))[::-1] # [0.61112427, 0.12734931, 0.14072311]

What I notice is that the covariance matrix I get from NumPy is much larger than the one I get from statsmodels:

>>> cov
array([[ 841951.9188366 , -154385.61049538, -188456.18957375],
       [-154385.61049538,   36561.27989418,   31208.76422516],
       [-188456.18957375,   31208.76422516,   44643.58346933]])
>>> regression.cov_params()
array([[ 0.0031913 ,  0.00223093, -0.0134716 ],
       [ 0.00223093,  0.00261355, -0.0110361 ],
       [-0.0134716 , -0.0110361 ,  0.0601861 ]])

As long as I can't make them equivalent, I won't be able to get the same errors. Any idea of what the difference in scale could mean and how to make both covariance matrices equal?

statsmodels documentation is not well organized in some parts. Here is a notebook with an example for the following https://www.statsmodels.org/devel/examples/notebooks/generated/chi2_fitting.html

The regression models in statsmodels like OLS and WLS, have an option to keep the scale fixed. This is the equivalent to cov="unscaled" in numpy and scipy. The statsmodels option is more general, because it allows fixing the scale at any user defined value.

https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLSResults.get_robustcov_results.html

We we have a model as defined in the example, either OLS or WLS, then using

regression = model.fit(cov_type="fixed scale")

will keep the scale at 1 and the resulting covariance matrix is unscaled.

Using

regression = model.fit(cov_type="fixed scale", cov_kwds={"scale": 2})

will keep the scale fixed at value two.

(some links to related discussion motivation are in https://github.com/statsmodels/statsmodels/pull/2137 )

Caution

The fixed scale cov_type will be used for inferential statistic that are based on the covariance of the parameter estimates, cov_params . This affects standard errors, t-tests, wald tests and confidence and prediction intervals.

However, some other results statistics might not be adjusted to use the fixed scale instead of the estimated scale, eg resid_pearson .

https://github.com/statsmodels/statsmodels/issues/8190

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM