简体   繁体   English

ggplot2:如何在 geom_smooth 中获得可靠的预测置信区间?

[英]ggplot2: how to get robust confidence interval for predictions in geom_smooth?

consider this simple example考虑这个简单的例子

dataframe <- data_frame(x = c(1,2,3,4,5,6),
                        y = c(12,24,24,34,12,15))
> dataframe
# A tibble: 6 x 2
      x     y
  <dbl> <dbl>
1     1    12
2     2    24
3     3    24
4     4    34
5     5    12
6     6    15    

dataframe %>% ggplot(., aes(x = x, y = y)) + 
geom_point() + 
geom_smooth(method = 'lm', formula = y~x)

Here the standard errors are computed with the default option.这里的标准误差是使用默认选项计算的。 However, I would like to use the robust variance-covariance matrix available in the package sandwich and lmtest但是,我想使用包sandwichlmtest可用的稳健方差-协方差矩阵

That is, using vcovHC(mymodel, "HC3")也就是说,使用vcovHC(mymodel, "HC3")

Is there a way to get that in a simple way using the geom_smooth() function?有没有办法使用geom_smooth()函数以简单的方式获得它?

在此处输入图片说明

UPDATE: 2021-03-17 It was recently pointed out to me that the ggeffects package handles different VCOVs automatically, including the trickier HAC case that I originally demonstrated below.更新:2021-03-17最近有人向我指出ggeffects包会自动处理不同的 VCOV,包括我最初在下面演示的更棘手的 HAC 案例。 Quick example of the latter:后者的快速示例:

library(ggeffects)
library(sandwich)  ## For HAC and other robust VCOVs

d <- data.frame(x = c(1,2,3,4,5,6),
                                y = c(12,24,24,34,12,15))

reg1 <- lm(y ~ x, data = d)

plot(ggpredict(reg1, "x", vcov.fun = "vcovHAC"))
#> Loading required namespace: ggplot2

## This gives you a regular ggplot2 object. So you can add layers as you
## normally would. E.g. If you'd like to compare with the original data...
library(ggplot2)
last_plot() +
    geom_point(data = d, aes(x, y)) +
    labs(caption = 'Shaded region indicates HAC 95% CI.')

Created on 2021-03-17 by the reprex package (v1.0.0)reprex 包(v1.0.0) 于 2021 年 3 月 17 日创建

My original answer follows below...我的原始答案如下...

HC robust SEs (simple) HC 稳健 SE(简单)

This is easily done now thanks to the estimatr package and its family of lm_robust functions.由于estimatr包及其lm_robust函数系列,这现在很容易完成。 Eg例如

library(tidyverse)
library(estimatr)

d <- data.frame(x = c(1,2,3,4,5,6),
                y = c(12,24,24,34,12,15))

d %>% 
  ggplot(aes(x = x, y = y)) + 
  geom_point() + 
  geom_smooth(method = 'lm_robust', formula = y~x, fill="#E41A1C") + ## Robust (HC) SEs
  geom_smooth(method = 'lm', formula = y~x, col = "grey50") + ## Just for comparison
  labs(
    title = "Plotting HC robust SEs in ggplot2",
    subtitle = "Regular SEs in grey for comparison"
    ) +
  theme_minimal()

Created on 2020-03-08 by the reprex package (v0.3.0)reprex 包(v0.3.0) 于 2020 年 3 月 8 日创建

HAC robust SEs (a bit more legwork) HAC 强大的 SE(更多跑腿工作)

The one caveat is that estimatr does not yet offer support for HAC (ie heteroscedasticity and autocorrelation consistent) SEs a la Newey-West.一个警告是estimatr还没有提供对 HAC(即异方差自相关一致)SE支持,就像Newey-West 一样。 However, it is possible to obtain these manually with the sandwich package... which is kind of what the original question was asking anyway.但是,可以使用三明治包装手动获取这些……无论如何,这就是最初的问题所要问的。 You can then plot them using geom_ribbon() .然后,您可以使用geom_ribbon()绘制它们。

I'll say for the record that HAC SEs don't make much sense for this particular data set.我要郑重声明,HAC SE 对这个特定的数据集没有多大意义。 But here's an example of how you could do it, riffing off this excellent SO answer on a related topic.但这里有一个例子,说明如何做到这一点,在相关主题上重复这个优秀的SO 答案。

library(tidyverse)
library(sandwich)

d <- data.frame(x = c(1,2,3,4,5,6),
                y = c(12,24,24,34,12,15))

reg1 <- lm(y~x, data = d)

## Generate a prediction DF
pred_df <- data.frame(fit = predict(reg1))

## Get the design matrix
X_mat <- model.matrix(reg1)

## Get HAC VCOV matrix and calculate SEs
v_hac <- NeweyWest(reg1, prewhite = FALSE, adjust = TRUE) ## HAC VCOV (adjusted for small data sample)
#> Warning in meatHAC(x, order.by = order.by, prewhite = prewhite, weights =
#> weights, : more weights than observations, only first n used
var_fit_hac <- rowSums((X_mat %*% v_hac) * X_mat)  ## Point-wise variance for predicted mean
se_fit_hac <- sqrt(var_fit_hac) ## SEs

## Add these to pred_df and calculate the 95% CI
pred_df <-
  pred_df %>%
  mutate(se_fit_hac = se_fit_hac) %>%
  mutate(
    lwr_hac = fit - qt(0.975, df=reg1$df.residual)*se_fit_hac,
    upr_hac = fit + qt(0.975, df=reg1$df.residual)*se_fit_hac
    )

pred_df
#>        fit se_fit_hac   lwr_hac  upr_hac
#> 1 20.95238   4.250961  9.149822 32.75494
#> 2 20.63810   2.945392 12.460377 28.81581
#> 3 20.32381   1.986900 14.807291 25.84033
#> 4 20.00952   1.971797 14.534936 25.48411
#> 5 19.69524   2.914785 11.602497 27.78798
#> 6 19.38095   4.215654  7.676421 31.08548

## Plot it
bind_cols(
  d,
  pred_df
  ) %>%
  ggplot(aes(x = x, y = y, ymin=lwr_hac, ymax=upr_hac)) + 
  geom_point() + 
  geom_ribbon(fill="#E41A1C", alpha=0.3, col=NA) + ## Robust (HAC) SEs
  geom_smooth(method = 'lm', formula = y~x, col = "grey50") + ## Just for comparison
  labs(
    title = "Plotting HAC SEs in ggplot2",
    subtitle = "Regular SEs in grey for comparison",
    caption = "Note: Do HAC SEs make sense for this dataset? Definitely not!"
    ) +
  theme_minimal()

Created on 2020-03-08 by the reprex package (v0.3.0)reprex 包(v0.3.0) 于 2020 年 3 月 8 日创建

Note that you could also use this approach to manually calculate and plot other robust SE predictions (eg HC1, HC2,etc.) if you so wished.请注意,如果您愿意,您也可以使用此方法手动计算和绘制其他稳健的 SE 预测(例如 HC1、HC2 等)。 All you would need to do is use the relevant sandwich estimator.您需要做的就是使用相关的三明治估算器。 For instance, using vcovHC(reg1, type = "HC2") instead of NeweyWest(reg1, prewhite = FALSE, adjust = TRUE) will give you an identical HC-robust CI to the first example that uses the estimatr package.例如,使用vcovHC(reg1, type = "HC2")而不是NeweyWest(reg1, prewhite = FALSE, adjust = TRUE)将为您提供与使用estimatr包的第一个示例相同的 HC-robust CI。

I am very new to this whole robust SE thing, but I was able to generate the following:我对整个强大的 SE 很陌生,但我能够生成以下内容:

zz = '
x y
1     1    12
2     2    24
3     3    24
4     4    34
5     5    12
6     6    15 
'

df <- read.table(text = zz, header = TRUE)
df

library(sandwich)
library(lmtest)

lm.model<-lm(y ~ x, data = df)
coef(lm.model)
se = sqrt(diag(vcovHC(lm.model, type = "HC3")))
fit = predict(lm.model)
predframe <- with(df,data.frame(x,
                                y = fit,
                                lwr = fit - 1.96 * se,
                                upr = fit + 1.96 * se))

library(ggplot2)
ggplot(df, aes(x = x, y = y))+
  geom_point()+
  geom_line(data = predframe)+
  geom_ribbon(data = predframe, aes(ymin = lwr,ymax = upr), alpha = 0.3)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在ggplot2中使用geom_stat / geom_smooth时,查找置信区间上下的点 - Find points over and under the confidence interval when using geom_stat / geom_smooth in ggplot2 在 ggplot2 中,指定用于 geom_smooth() 置信区间的值(类似于 geom_errorbar) - In ggplot2,specify values to use for geom_smooth() confidence interval (similar to geom_errorbar) R : 使用 ggplot2 部分显示的置信区间(使用 geom_smooth()) - R : confidence interval being partially displayed with ggplot2 (using geom_smooth()) 在 ggplot2 中,使用现有 CI 变量指定 geom_smooth(或任何趋势线)周围的置信区间 (95% CI) - In ggplot2, specify a confidence interval (95% CI) around geom_smooth (or any trend line) using existing CI variables ggplot2的geom_smooth()是否显示逐点置信带或同时置信带? - Does geom_smooth() of ggplot2 show pointwise confidence bands, or simultaneous confidence bands? 如何在 r 中的 ggplot2 中将多个变量与 geom_smooth 一起使用 - How to use multiple variables with geom_smooth in ggplot2 in r 与ggplot2,geom_smooth和nls拟合 - Fitting with ggplot2, geom_smooth and nls 在ggplot2中用geom_smooth指定密度 - Points density with geom_smooth in ggplot2 ggplot2中的单独图例geom_smooth - separate legend geom_smooth in ggplot2 在ggplot2 / geom_smooth中使用跨度 - Working with span in ggplot2 / geom_smooth
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM