简体   繁体   English

GLMMadaptive (R) 中的零膨胀两部分模型:固定效应零部分的方差分析?

[英]Zero-inflated two-part models in GLMMadaptive (R): anova on fixed effects zero-part?

I'm running a hurdle lognormal model using the GLMMadaptive package in R.我正在使用 R 中的 GLMMadaptive package 运行障碍对数正态 model。 Both the continuous part as well as the zero-part have categorical variables defined in the fixed effects.连续部分和零部分都在固定效应中定义了分类变量。 I would like to run an ANOVA on these categorical variables to detect if there is a main effect.我想对这些分类变量进行方差分析,以检测是否存在主要影响。

I've seen that using the glmmTMB package you are able to separately run an ANOVA on the conditional model and the zero-part model separately, as is demonstrated here .我已经看到,使用 glmmTMB package 您可以在条件 model 和零部分 model 上单独运行方差分析,如此处所示

Is there a similar strategy available for the GLMMadaptive package? GLMMadaptive package 是否有类似的策略可用? (The glmmTMB does not support hurdle lognormal models as far as I understood). (据我所知,glmmTMB 不支持障碍对数正态模型)。 Perhaps using the joint_tests function from the emmeans package?也许使用来自joint_tests package的joint_tests function? If so, how do you define that you want to test the zero-part model?如果是这样,您如何定义要测试零部分 model? As emmeans::joint_tests(hurdlemodel) only gives the F-tests for the conditional part of the model.由于emmeans::joint_tests(hurdlemodel)仅给出 model 的条件部分的 F 检验。

Or as an alternative method, could you compare the fit of the models where you exclude the variable of interest against a the full model, as is demonstrated for the relevance of random effects in this vignette ?或者作为一种替代方法,您能否将排除感兴趣变量的模型与完整的 model 进行比较,正如本小插图中随机效应的相关性所证明的那样?

Many thanks!非常感谢!


The suggestion by Russ Lenth in the comments are implemented below, using the data and model in the GLMMadaptive two-part model vignette : Russ Lenth 在评论中的建议在下面实现,使用GLMMadaptive 两部分 model vignette中的数据和 model :

library(GLMMadaptive)
library(emmeans)

# data generating code from the vignette:
{
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 5 # maximum follow-up time

# we construct a data frame with the design: 
# everyone has a baseline measurement, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
                 time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
                 sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))

# design matrices for the fixed and random effects non-zero part
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ 1, data = DF)
# design matrices for the fixed and random effects zero part
X_zi <- model.matrix(~ sex, data = DF)
Z_zi <- model.matrix(~ 1, data = DF)

betas <- c(1.5, 0.05, 0.05, -0.03) # fixed effects coefficients non-zero part
shape <- 2 # shape/size parameter of the negative binomial distribution
gammas <- c(-1.5, 0.5) # fixed effects coefficients zero part
D11 <- 0.5 # variance of random intercepts non-zero part
D22 <- 0.4 # variance of random intercepts zero part

# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor non-zero part
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, 1, drop = FALSE]))
# linear predictor zero part
eta_zi <- as.vector(X_zi %*% gammas + rowSums(Z_zi * b[DF$id, 2, drop = FALSE]))
# we simulate negative binomial longitudinal data
DF$y <- rnbinom(n * K, size = shape, mu = exp(eta_y))
# we set the extra zeros
DF$y[as.logical(rbinom(n * K, size = 1, prob = plogis(eta_zi)))] <- 0
}

#create categorical time variable
DF$time_categorical[DF$time<2.5] <- "early"
DF$time_categorical[DF$time>=2.5] <- "late"
DF$time_categorical <- as.factor(DF$time_categorical)

#model with interaction in fixed effects zero part and adding nesting in zero part as in model above
km3 <- mixed_model(y ~ sex * time_categorical, random = ~ 1 | id, data = DF, 
                   family = hurdle.lognormal(), n_phis = 1,
                   zi_fixed = ~ sex * time_categorical, zi_random = ~ 1 | id)

#### ATTEMPT at QDRG function in emmeans ####

coef_zero_part <- fixef(km3, sub_model = "zero_part")
vcov_zero_part <- vcov(km3)[9:12,9:12]

qd_km3 <- emmeans::qdrg(formula = ~ sex * time_categorical, data = DF,
coef = coef_zero_part, vcov = vcov_zero_part)

Output: Output:

> joint_tests(qd_km3)
 model term           df1 df2 F.ratio p.value
 sex                    1 Inf  11.509 0.0007 
 time_categorical       1 Inf   0.488 0.4848 
 sex:time_categorical   1 Inf   1.077 0.2993 

> emmeans(qd_km3, pairwise ~ sex|time_categorical)
$emmeans
time_categorical = early:
 sex    emmean    SE  df asymp.LCL asymp.UCL
 male   -1.592 0.201 Inf     -1.99    -1.198
 female -1.035 0.187 Inf     -1.40    -0.669

time_categorical = late:
 sex    emmean    SE  df asymp.LCL asymp.UCL
 male   -1.914 0.247 Inf     -2.40    -1.429
 female -0.972 0.188 Inf     -1.34    -0.605

Confidence level used: 0.95 

$contrasts
time_categorical = early:
 contrast      estimate    SE  df z.ratio p.value
 male - female   -0.557 0.270 Inf -2.064  0.0390 

time_categorical = late:
 contrast      estimate    SE  df z.ratio p.value
 male - female   -0.942 0.306 Inf -3.079  0.0021 

Checking if contrasts correspond with zero-part fixed effects:检查对比是否与零部分固定效应相对应:

> fixef(km3, sub_model = "zero_part")
                   (Intercept)                      sexfemale           time_categoricallate sexfemale:time_categoricallate 
                    -1.5920415                      0.5568072                     -0.3220390                      0.3849780 

> (-1.5920) - (-1.5920 + 0.5568)
[1] -0.5568 #matches contrast within "early" level of "time_categorical"
> (-1.5920 + -0.3220) - (-1.5920 + -0.3220  + 0.5568 + 0.3850)
[1] -0.9418 #matches contrast within "late" level of "time_categorical"

The function emmeans::qdrg() can sometimes be used to create the needed object for a model not directly supported by emmeans . function emmeans::qdrg()有时可用于为 emmeans 不直接支持的 model 创建所需的object See its documentation.请参阅其文档。 In very simple models (eg, inheriting from lm , it may be enough to supply the object and data arguments.在非常简单的模型中(例如,从lm继承,提供objectdata arguments 可能就足够了。

That usually does not work for more sophisticated models, in which case you will need to specify data , the fixed-effects formula for the conditional or zero part of the model, and the associated regression coefficients ( coef ) and variance-covariance matrix ( vcov ) for the part of the model in question.这通常不适用于更复杂的模型,在这种情况下,您需要指定data 、 model 的条件或零部分的固定效应formula ,以及相关的回归系数 ( coef ) 和方差-协方差矩阵 ( vcov ) 对于有问题的 model 部分。 Often with models like this with multiple components, you likely will have to pick a subset of the coefficients and covariance matrix.通常对于具有多个组件的此类模型,您可能必须选择系数和协方差矩阵的子集。 These all must conform: the length of coef must equal the number of rows and columns of vcov and the number of columns in the model matrix generated by formula [which may be checked via model.matrix(formula, data = data) ].这些都必须符合: coef的长度必须等于vcov的行数和列数以及由formula生成的 model 矩阵中的列数[可以通过model.matrix(formula, data = data)进行检查]。

qdrg() will not work for a multivariate model -- or at least it's tricky -- because the implied model involves other factor(s) that delineate the levels of the multivariate response. qdrg()不适用于多变量 model - 或者至少它很棘手 - 因为隐含的 model 涉及描述多变量响应水平的其他因素。 If there are special provisions for, say, spline smoothing, that is another instance where qdrg() probably can't be made to work.如果有特殊规定,例如样条平滑,那是另一个可能无法使qdrg()工作的情况。

Once qdrg() actually runs and produces results, it is a good idea to use it to estimate some contrasts that are estimated by the model parameterization.一旦qdrg()实际运行并产生结果,最好使用它来估计由 model 参数化估计的一些对比。 For example, suppose that the model was fitted with the default contr.treatment contrasts.例如,假设 model 配备了默认的contr.treatment对比。 Then the regression coefficients are interpretable as a comparison with the first level as a reference level.然后回归系数可以解释为与作为参考水平的第一水平的比较。 Accordingly, if we computed rg <- qdrg(...) , and one of the factors is "treat" , look at contrast(rg, "trt.vs.ctrl1", simple = "treat") , and check to see if the first set of estimated contrasts matches the main-effect estimates for treat .因此,如果我们计算rg <- qdrg(...) ,并且其中一个因素是"treat" ,请查看contrast(rg, "trt.vs.ctrl1", simple = "treat") ,然后查看如果第一组估计的对比与treat的主效应估计相匹配。

I will illustrate all of this with a simple lm model, ignoring the fact that it is already supported by emmeans .我将用一个简单的lm model 来说明所有这些,忽略emmeans已经支持它的事实。

> warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)

Here is the reference grid这是参考网格

> rg <- qdrg(~ wool * tension, coef = coef(warp.lm), vcov = vcov(warp.lm),
+     df = df.residual(warp.lm), data = warpbreaks)

Here is a sanity check -- First, look at the model summary:这是一个健全性检查——首先,查看 model 摘要:

> summary(warp.lm)$coef
                Estimate Std. Error   t value     Pr(>|t|)
(Intercept)     44.55556   3.646761 12.217842 2.425903e-16
woolB          -16.33333   5.157299 -3.167032 2.676803e-03
tensionM       -20.55556   5.157299 -3.985721 2.280796e-04
tensionH       -20.00000   5.157299 -3.877999 3.199282e-04
woolB:tensionM  21.11111   7.293523  2.894501 5.698287e-03
woolB:tensionH  10.55556   7.293523  1.447251 1.543266e-01

Second, look at selected contrasts:其次,看选定的对比:

> contrast(rg, "trt.vs.ctrl1", simple = "wool")
tension = L:
 contrast estimate   SE df t.ratio p.value
 B - A      -16.33 5.16 48 -3.167  0.0027 

tension = M:
 contrast estimate   SE df t.ratio p.value
 B - A        4.78 5.16 48  0.926  0.3589 

tension = H:
 contrast estimate   SE df t.ratio p.value
 B - A       -5.78 5.16 48 -1.120  0.2682 

> contrast(rg, "trt.vs.ctrl1", simple = "tension")
wool = A:
 contrast estimate   SE df t.ratio p.value
 M - L     -20.556 5.16 48 -3.986  0.0005 
 H - L     -20.000 5.16 48 -3.878  0.0006 

wool = B:
 contrast estimate   SE df t.ratio p.value
 M - L       0.556 5.16 48  0.108  0.9863 
 H - L      -9.444 5.16 48 -1.831  0.1338 

P value adjustment: dunnettx method for 2 tests 

Comparing with the regression coefficients, we do confirm that the first contrast for wool is estimated as -16.33, matching the regression coefficient for woolB .与回归系数相比,我们确实确认了wool的第一个对比估计为woolB ,与羊毛B 的回归系数相匹配。 Also, the first set of contrasts for tension are estimated as -20.556 and -20.0, matching the regression coefficients for tensionM and tensionH .此外, tension的第一组对比估计为tensionM和 -20.0,与张力 M 和张力 H 的回归系数相tensionH The SEs and t ratios match as well. SEs 和t比率也匹配。 (The P values for the second set do not match due to the multiplicity adjustment.) (由于多重性调整,第二组的P值不匹配。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM