ggplot2：如何在 geom_smooth 中获得可靠的预测置信区间？

Question

consider this simple example考虑这个简单的例子

dataframe <- data_frame(x = c(1,2,3,4,5,6),
                        y = c(12,24,24,34,12,15))
> dataframe
# A tibble: 6 x 2
      x     y
  <dbl> <dbl>
1     1    12
2     2    24
3     3    24
4     4    34
5     5    12
6     6    15    

dataframe %>% ggplot(., aes(x = x, y = y)) + 
geom_point() + 
geom_smooth(method = 'lm', formula = y~x)

Here the standard errors are computed with the default option.这里的标准误差是使用默认选项计算的。 However, I would like to use the robust variance-covariance matrix available in the package sandwich and lmtest但是，我想使用包sandwich和lmtest可用的稳健方差-协方差矩阵

That is, using vcovHC(mymodel, "HC3")也就是说，使用vcovHC(mymodel, "HC3")

Is there a way to get that in a simple way using the geom_smooth() function?有没有办法使用geom_smooth()函数以简单的方式获得它？

Answer 1

UPDATE: 2021-03-17 It was recently pointed out to me that the ggeffects package handles different VCOVs automatically, including the trickier HAC case that I originally demonstrated below.更新：2021-03-17最近有人向我指出ggeffects包会自动处理不同的 VCOV，包括我最初在下面演示的更棘手的 HAC 案例。 Quick example of the latter:后者的快速示例：

library(ggeffects)
library(sandwich)  ## For HAC and other robust VCOVs

d <- data.frame(x = c(1,2,3,4,5,6),
                                y = c(12,24,24,34,12,15))

reg1 <- lm(y ~ x, data = d)

plot(ggpredict(reg1, "x", vcov.fun = "vcovHAC"))
#> Loading required namespace: ggplot2

## This gives you a regular ggplot2 object. So you can add layers as you
## normally would. E.g. If you'd like to compare with the original data...
library(ggplot2)
last_plot() +
    geom_point(data = d, aes(x, y)) +
    labs(caption = 'Shaded region indicates HAC 95% CI.')

^{Created on 2021-03-17 by the reprex package (v1.0.0)}^{由reprex 包(v1.0.0) 于 2021 年 3 月 17 日创建}

My original answer follows below...我的原始答案如下...

HC robust SEs (simple) HC 稳健 SE（简单）

This is easily done now thanks to the estimatr package and its family of lm_robust functions.由于estimatr包及其lm_robust函数系列，这现在很容易完成。 Eg例如

library(tidyverse)
library(estimatr)

d <- data.frame(x = c(1,2,3,4,5,6),
                y = c(12,24,24,34,12,15))

d %>% 
  ggplot(aes(x = x, y = y)) + 
  geom_point() + 
  geom_smooth(method = 'lm_robust', formula = y~x, fill="#E41A1C") + ## Robust (HC) SEs
  geom_smooth(method = 'lm', formula = y~x, col = "grey50") + ## Just for comparison
  labs(
    title = "Plotting HC robust SEs in ggplot2",
    subtitle = "Regular SEs in grey for comparison"
    ) +
  theme_minimal()

^{Created on 2020-03-08 by the reprex package (v0.3.0)}^{由reprex 包(v0.3.0) 于 2020 年 3 月 8 日创建}

HAC robust SEs (a bit more legwork) HAC 强大的 SE（更多跑腿工作）

The one caveat is that estimatr does not yet offer support for HAC (ie heteroscedasticity and autocorrelation consistent) SEs a la Newey-West.一个警告是estimatr还没有提供对 HAC（即异方差和自相关一致）SE的支持，就像Newey-West 一样。 However, it is possible to obtain these manually with the sandwich package... which is kind of what the original question was asking anyway.但是，可以使用三明治包装手动获取这些……无论如何，这就是最初的问题所要问的。 You can then plot them using geom_ribbon() .然后，您可以使用geom_ribbon()绘制它们。

I'll say for the record that HAC SEs don't make much sense for this particular data set.我要郑重声明，HAC SE 对这个特定的数据集没有多大意义。 But here's an example of how you could do it, riffing off this excellent SO answer on a related topic.但这里有一个例子，说明如何做到这一点，在相关主题上重复这个优秀的SO 答案。

library(tidyverse)
library(sandwich)

d <- data.frame(x = c(1,2,3,4,5,6),
                y = c(12,24,24,34,12,15))

reg1 <- lm(y~x, data = d)

## Generate a prediction DF
pred_df <- data.frame(fit = predict(reg1))

## Get the design matrix
X_mat <- model.matrix(reg1)

## Get HAC VCOV matrix and calculate SEs
v_hac <- NeweyWest(reg1, prewhite = FALSE, adjust = TRUE) ## HAC VCOV (adjusted for small data sample)
#> Warning in meatHAC(x, order.by = order.by, prewhite = prewhite, weights =
#> weights, : more weights than observations, only first n used
var_fit_hac <- rowSums((X_mat %*% v_hac) * X_mat)  ## Point-wise variance for predicted mean
se_fit_hac <- sqrt(var_fit_hac) ## SEs

## Add these to pred_df and calculate the 95% CI
pred_df <-
  pred_df %>%
  mutate(se_fit_hac = se_fit_hac) %>%
  mutate(
    lwr_hac = fit - qt(0.975, df=reg1$df.residual)*se_fit_hac,
    upr_hac = fit + qt(0.975, df=reg1$df.residual)*se_fit_hac
    )

pred_df
#>        fit se_fit_hac   lwr_hac  upr_hac
#> 1 20.95238   4.250961  9.149822 32.75494
#> 2 20.63810   2.945392 12.460377 28.81581
#> 3 20.32381   1.986900 14.807291 25.84033
#> 4 20.00952   1.971797 14.534936 25.48411
#> 5 19.69524   2.914785 11.602497 27.78798
#> 6 19.38095   4.215654  7.676421 31.08548

## Plot it
bind_cols(
  d,
  pred_df
  ) %>%
  ggplot(aes(x = x, y = y, ymin=lwr_hac, ymax=upr_hac)) + 
  geom_point() + 
  geom_ribbon(fill="#E41A1C", alpha=0.3, col=NA) + ## Robust (HAC) SEs
  geom_smooth(method = 'lm', formula = y~x, col = "grey50") + ## Just for comparison
  labs(
    title = "Plotting HAC SEs in ggplot2",
    subtitle = "Regular SEs in grey for comparison",
    caption = "Note: Do HAC SEs make sense for this dataset? Definitely not!"
    ) +
  theme_minimal()

^{Created on 2020-03-08 by the reprex package (v0.3.0)}^{由reprex 包(v0.3.0) 于 2020 年 3 月 8 日创建}

Note that you could also use this approach to manually calculate and plot other robust SE predictions (eg HC1, HC2,etc.) if you so wished.请注意，如果您愿意，您也可以使用此方法手动计算和绘制其他稳健的 SE 预测（例如 HC1、HC2 等）。 All you would need to do is use the relevant sandwich estimator.您需要做的就是使用相关的三明治估算器。 For instance, using vcovHC(reg1, type = "HC2") instead of NeweyWest(reg1, prewhite = FALSE, adjust = TRUE) will give you an identical HC-robust CI to the first example that uses the estimatr package.例如，使用vcovHC(reg1, type = "HC2")而不是NeweyWest(reg1, prewhite = FALSE, adjust = TRUE)将为您提供与使用estimatr包的第一个示例相同的 HC-robust CI。

Answer 2

I am very new to this whole robust SE thing, but I was able to generate the following:我对整个强大的 SE 很陌生，但我能够生成以下内容：

zz = '
x y
1     1    12
2     2    24
3     3    24
4     4    34
5     5    12
6     6    15 
'

df <- read.table(text = zz, header = TRUE)
df

library(sandwich)
library(lmtest)

lm.model<-lm(y ~ x, data = df)
coef(lm.model)
se = sqrt(diag(vcovHC(lm.model, type = "HC3")))
fit = predict(lm.model)
predframe <- with(df,data.frame(x,
                                y = fit,
                                lwr = fit - 1.96 * se,
                                upr = fit + 1.96 * se))

library(ggplot2)
ggplot(df, aes(x = x, y = y))+
  geom_point()+
  geom_line(data = predframe)+
  geom_ribbon(data = predframe, aes(ymin = lwr,ymax = upr), alpha = 0.3)

ggplot2：如何在 geom_smooth 中获得可靠的预测置信区间？

问题描述

2 个解决方案

解决方案1
1 2019-03-07 02:37:40

解决方案2
0 2017-07-26 03:52:05

ggplot2：如何在 geom_smooth 中获得可靠的预测置信区间？

问题描述

2 个解决方案

解决方案1 1 2019-03-07 02:37:40

解决方案2 0 2017-07-26 03:52:05

解决方案1
1 2019-03-07 02:37:40

解决方案2
0 2017-07-26 03:52:05