简体   繁体   English

统计模型根据异方差一致的标准误差绘制平均置信区间

[英]Statsmodels Plotting mean confidence intervals based on heteroscedastic consistent standard errors

This question is similar to confidence and prediction intervals with StatsModels but with an added nuance: 这个问题类似于StatsModels的置信度和预测间隔,但有一个细微差别:

My data is heteroscedastic and I would like to plot the confidence interval on the mean using any one of the heteroscedastic consistent standard errors that statsmodels provides (HC0_se, HC1_se, etc.). 我的数据是异方差的,我想使用statsmodels提供的任何异方差一致标准误差(HC0_se,HC1_se等)在均值上绘制置信区间。 I can't find any easy access to this information for each fitted value (though it's quite easy to get the intervals for each coefficient). 我很难找到每个拟合值的信息(尽管很容易获得每个系数的区间)。 It also does not seem to be contained in the results summary table in stats.outliers in the same way that the standard mean confidence interval data is. 似乎也不会以与标准平均置信区间数据相同的方式将其包含在stats.outliers的结果摘要表中。

Two questions: 两个问题:

  1. Does anyone have any idea how I can do this? 有谁知道我该怎么做?
  2. What does one typically use the heteroscedastic-consistent covariance matrices for that are also available in the linear regression results object? 通常使用线性回归结果对象中也有的异方差一致协方差矩阵是什么? Why is that made available? 为什么可以使用?

Many thanks 非常感谢

I don't believe there's a way to specify which covariance matrix you want to use for calculation of prediction standard errors yet. 我认为还没有一种方法可以指定要用于计算预测标准误的协方差矩阵。 Note that the prediction code is still in the "sandbox" folder in the statsmodels repository. 请注意,预测代码仍位于statsmodels存储库中的“ sandbox”文件夹中。 I'm sure Github pull requests would be welcome :) 我确信Github拉取请求会受到欢迎:)

In any case, this should be pretty simple to do. 无论如何,这应该很简单。 Here's a link to the under-the-hood code for the prediction function that you linked to. 这是您链接到的预测功能的高级代码的链接。 Essentially, you would just need to substitute the covariance matrix you want to use instead of the covb variable. 本质上,您只需要替换要使用的协方差矩阵而不是covb变量即可。

Then, you can use he same matplotlib tidbit you saw in the other SO post. 然后,您可以使用在其他SO帖子中看到的相同的matplotlib小知识。

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/regression/predstd.py#L27 https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/regression/predstd.py#L27

predvar = res.mse_resid/weights + (exog * np.dot(covb, exog.T).T).sum(1)
predstd = np.sqrt(predvar)
tppf = stats.t.isf(alpha/2., res.df_resid)
interval_u = predicted + tppf * predstd
interval_l = predicted - tppf * predstd
return predstd, interval_l, interval_u

Robust standard errors or covariances are not yet fully integrated into the models. 鲁棒的标准误差或协方差尚未完全集成到模型中。 They are currently mainly add-ons to get them after the model is estimated. 目前,它们主要是在模型估计后才能获得的附加组件。

We will be able to change default covariance to any of the available robust covariance estimators in the next release of statsmodels and is already in current master for OLS. 在下一版本的statsmodels中,我们将能够将默认协方差更改为任何可用的鲁棒协方差估计器,并且它已经在OLS的当前版本中。 Then all additional results, t_test, wald_test and so on, will be using the robust or nonrobust covariance that has been defined as default. 然后,所有其他结果(t_test,wald_test等)将使用已定义为默认值的稳健或非稳健协方差。 current version: http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLSResults.get_robustcov_results.html 当前版本: http//statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLSResults.get_robustcov_results.html

For the prediction standard errors: 对于预测标准误:

I think the calculations are the same when cov_params is a robust sandwich estimator, but I haven't verified that against Stata. 我认为当cov_params是鲁棒的三明治估计量时,计算结果是相同的,但是我尚未针对Stata进行验证。 see the last part of my answer in Mathematical background of statsmodels wls_prediction_std statsmodels wls_prediction_std的数学背景中查看我答案的最后一部分

So in statsmodels 0.5 it's not possible to get the prediction errors with robust covariances directly, you need to copy the function to use the desired cov_params. 因此,在statsmodels 0.5中,无法直接获得具有鲁棒协方差的预测误差,您需要复制该函数以使用所需的cov_params。

Why do we use robust covariances 为什么我们使用稳健的协方差

If there is heteroscedasticity or correlation of observations, then OLS has consistent or unbiased parameter estimates, but the standard covariance matrix of the parameter estimates is "wrong". 如果观测值存在异方差性或相关性,则OLS具有一致或无偏的参数估计,但是参数估计的标准协方差矩阵是“错误的”。 So we need to get a covariance matrix that is robust to heteroscedasticity, correlation or both. 因此,我们需要获得对异方差性,相关性或两者均具有鲁棒性的协方差矩阵。

Many modern econometrics textbooks recommend to always use robust covariance estimators, when we are not sure about the correct specification of heteroscedasticity or correlation across observations. 当我们不确定跨观测值的异方差或相关性的正确规范时,许多现代计量经济学教科书建议始终使用健壮的协方差估计量。 Which is almost always the case in economics. 在经济学中几乎总是这样。

The simplest case is just heteroscedasticity http://en.wikipedia.org/wiki/Heteroscedasticity-consistent_standard_errors but in timeseries we might have autocorrelation that we did not include in the model, in repeated measures or panel data we often have correlation within clusters or panels. 最简单的情况就是异方差http://en.wikipedia.org/wiki/Heteroscedasticity-consistent_standard_errors,但在时间序列中,我们可能具有我们未包含在模型中的自相关,在重复测量或面板数据中,我们通常在聚类或面板。 Robust covariances give us consistent standard errors in these cases. 在这些情况下,稳健的协方差为我们提供了一致的标准误。

The same can apply to other models, for example cluster robust standard errors in Poisson or Logit model in generalized estimating equations (GEE). 这同样适用于其他模型,例如,泊松模型中的鲁棒标准误差或广义估计方程(GEE)中的Logit模型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 StatsModels 的置信区间和预测区间 - confidence and prediction intervals with StatsModels 使用 statsmodels SARIMAX 手动创建平均响应的置信区间 - Manually creating confidence intervals for the mean response using statsmodels SARIMAX Python Statsmodels:将SARIMAX与外生回归变量一起使用以获取预测的均值和置信区间 - Python Statsmodels: Using SARIMAX with exogenous regressors to get predicted mean and confidence intervals Scikit-Learn HuberRegressor:计算标准误差、t 统计量、p 值和置信区间 - Scikit-Learn HuberRegressor: Calculating standard errors, t-statistics, p-values and confidence intervals 通过重复输入绘制置信度和预测间隔 - Plotting confidence and prediction intervals with repeated entries 在 Python 中绘制 GP 95% 置信区间 - Plotting GP 95% confidence intervals in Python 绘制最大似然估计的置信区间 - Plotting confidence intervals for Maximum Likelihood Estimate 返回StatsModel中样本外预测的标准和置信区间 - Return std and confidence intervals for out-of-sample prediction in StatsModels Statsmodels VARMAX:具有多个内生变量的置信/预测区间 - Statsmodels VARMAX: confidence / predication intervals with more than one endogenous variable 置信区间与标准偏差相比 - confidence intervals compared with standard deviation in seaborn
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM