简体   繁体   中英

How to interpret confci function from model prediction? Why may not this match up to the bootstrap CI plot on ggplot2?

I am trying to reconcile the confidence intervals seen on the ggplot (by using bootstrapped CI) and that when I compute CI from the lmer model. I am unsure how to calculate the CI. How would I then plot the original points, with new mean and predicted CI?

set.seed(111)
oviposition.index <- rnorm(20, 2, 1.3)
species <- rep(c("A","B"), each = 10)
month <- rep(c("Jan", "Feb"), times = 10)
plot <- rep(c("1", "2"), times = 10)
df <- data.frame(oviposition.index, species, month, plot)

mod <- lmer(oviposition.index ~ species + (1|month/plot), df)
summary(mod)
confint(mod)

Model summary and confidence intervals

Fixed effects:
            Estimate Std. Error      df t value Pr(>|t|)  
(Intercept)   1.1303     0.3684  3.5822   3.068   0.0432 *
speciesB      0.8198     0.5131 17.0000   1.598   0.1285 



                2.5 %   97.5 %
.sig01       0.0000000 1.130819
.sig02       0.0000000 1.130819
.sigma       0.8232305 1.540289
(Intercept)  0.3600376 1.900525
speciesB    -0.1836920 1.823253

The way I see it: Species A:

Lower CI = 1.1303 - 0.3600376 = 0.7702624 (DOES NOT match graph)

Upper CI = 1.1303 + 1.900525 = 3.030825 (DOES NOT match graph)

Species B:

Lower CI = 0.8198 - (-0.1836920 )= 1.003492 (roughly matches graph)

Upper CI = 0.8198 + (1.823253) = 2.643053 (roughly matches graph)

Plot shows

ggplot(df, aes(x = species, y = oviposition.index, color = species)) + geom_point() +
   geom_hline(yintercept = 1) + 
   stat_summary(fun.data=mean_cl_boot, geom="errorbar", width=0.2, colour="black") + 
   stat_summary(fun = mean, color = "black", geom ="point", size = 3,show.legend = FALSE) 

在此处输入图片说明

The confidence intervals won't be the same because your mixed effects model has grouping variables that ggplot 's (really Hmisc 's) boot CI function doesn't have. Ultimately this leeds to the mixed effect model estimating more error in this scenario, which we see in the CIs.

That said the CIs from lmer are close to what you have plotted already. groupA (Intercept) is 1.1303 mu and 0.3684 se, and groupB is ~1.94 mu (1.13 + 0.81) and more variance with 0.5131 se. I don't think your interpretation of group differences will change with either one of the CI calculations.

A few more points to add to @Nate's answer.

The idea that the bootstrap function from Hmisc (which is what mean_cl_boot uses) is wrong because it doesn't take the grouping structure into account is basically correct.

I modified your fitting function slightly to make it more convenient to look at the confidence intervals for species A (suppressing the intercept by including -1 in the formula. I also tried it with and without lmerTest , for the purpose of making some comparisons discussed in more detail below.

library(lme4)
mod0 <- lmer(oviposition.index ~ species-1 + (1|month/plot), df)
library(lmerTest)
mod1 <- as(mod0, "lmerModLmerTest")

library(broom.mixed)

f <- function(m, mod = mod0, ...) {
  tt <- tidy(mod, conf.int = TRUE, effects = "fixed", conf.method = m, ...)
  as.data.frame(tt)[1, c("estimate", "conf.low", "conf.high")]
}
ctab <- rbind(
    hmboot = Hmisc::smean.cl.boot(oviposition.index[1:10]),
    hmwald = Hmisc::smean.cl.normal(oviposition.index[1:10]),
    wald = f("Wald"),
    wald_t_satt = f("Wald", mod1),
    wald_t_kr = f("Wald", mod1, ddf.method = "Kenward-Roger"),
    profile = f("profile"),
    pboot = f("boot")
)
print(ctab,digits =3)
  • a Wald test is based on the estimated curvature of the likelihood surface at the ML estimate. It's usually fastest but least accurate; it always gives symmetric CIs. It can be based on the assumption of a Normal sampling distribution of the estimate or based on a t-distribution; if the latter, then you need to specify some method of approximating the 'degrees of freedom' parameter of the t distribution.
  • profile likelihood is based on measuring the whole likelihood surface. It's more reliable (and slower) than Wald, but doesn't take small sample sizes into account
  • parametric bootstrap is the most reliable, but slowest method. It is based on simulating new data sets from the model.

Conclusions here are that the methods all give approximately the same estimates for the CI. The naive bootstrap (as you've used above) gives the (slightly) narrowest CIs, and the Wald estimate with Kenward-Roger degrees of freedom gives the widest (probably overconservative, as the parametric bootstrap ( pboot ) probably gives the best answer). (The Satterthwaite ddf approximation completely breaks down in this example.)

            estimate conf.low conf.high  
hmboot          1.13   0.4397      1.68 ## naive bootstrap
hmwald          1.13   0.4005      1.86 ## naive Wald (t-distrib)
wald_lmer       1.13   0.4082      1.85 ## mixed-model Wald (Z-distrib)
wald_t_satt     1.13      NaN       NaN ## mixed-model Wald (Satterthwaite)
wald_t_kr       1.13   0.0586      2.20 ## mixed-model Wald (Kenward-Roger)
profile         1.13   0.3600      1.90 ## likelihood profile CI
pboot           1.13   0.4111      1.82 ## parametric bootstrap CI

估计值和各种置信区间

If we get a little fancier (code below) we can get CIs for both groups:

两个物种的 CI

library(Hmisc)
f <- function(m, mod = mod0, w = 1:2, ...) {
  tt <- tidy(mod, conf.int = TRUE, effects = "fixed", conf.method = m, ...)
  tt[1:2, c("term","estimate", "conf.low", "conf.high")]
}

h <- function(sfun) {
  tab <- do.call(rbind, lapply(split(df, species),
                               function(d) sfun(d$oviposition.index)))
  tab <- data.frame(term = paste0("species", c("A","B")),
                    setNames(as.data.frame(tab), c("estimate", "conf.low", "conf.high")))
  return(tab)
}
h(smean.cl.normal)

tab2 <- dplyr::bind_rows(list(
    hmisc_boot = h(smean.cl.boot),
    hmisc_normal = h(smean.cl.normal),
    wald_lmer = f("Wald"),
    wald_t_satt = f("Wald", mod1),
    wald_t_kr = f("Wald", mod1, ddf.method = "Kenward-Roger"),
    profile = f("profile"),
    boot = f("boot")),
    .id = "method")

tab2$method <- factor(tab2$method, levels = unique(tab2$method))
ggplot(tab2, aes(x=term, y = estimate, colour = method)) +
  geom_pointrange(aes(ymin=conf.low, ymax = conf.high), position = position_dodge(width=0.25)) +
  geom_hline(yintercept = 1, lty = 2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM