[英]Zero-inflated two-part models in GLMMadaptive (R): anova on fixed effects zero-part?
I'm running a hurdle lognormal model using the GLMMadaptive package in R.我正在使用 R 中的 GLMMadaptive package 运行障碍对数正态 model。 Both the continuous part as well as the zero-part have categorical variables defined in the fixed effects.
连续部分和零部分都在固定效应中定义了分类变量。 I would like to run an ANOVA on these categorical variables to detect if there is a main effect.
我想对这些分类变量进行方差分析,以检测是否存在主要影响。
I've seen that using the glmmTMB package you are able to separately run an ANOVA on the conditional model and the zero-part model separately, as is demonstrated here .我已经看到,使用 glmmTMB package 您可以在条件 model 和零部分 model 上单独运行方差分析,如此处所示。
Is there a similar strategy available for the GLMMadaptive package? GLMMadaptive package 是否有类似的策略可用? (The glmmTMB does not support hurdle lognormal models as far as I understood).
(据我所知,glmmTMB 不支持障碍对数正态模型)。 Perhaps using the
joint_tests
function from the emmeans package?也许使用来自
joint_tests
package的joint_tests function? If so, how do you define that you want to test the zero-part model?如果是这样,您如何定义要测试零部分 model? As
emmeans::joint_tests(hurdlemodel)
only gives the F-tests for the conditional part of the model.由于
emmeans::joint_tests(hurdlemodel)
仅给出 model 的条件部分的 F 检验。
Or as an alternative method, could you compare the fit of the models where you exclude the variable of interest against a the full model, as is demonstrated for the relevance of random effects in this vignette ?或者作为一种替代方法,您能否将排除感兴趣变量的模型与完整的 model 进行比较,正如本小插图中随机效应的相关性所证明的那样?
Many thanks!非常感谢!
The suggestion by Russ Lenth in the comments are implemented below, using the data and model in the GLMMadaptive two-part model vignette : Russ Lenth 在评论中的建议在下面实现,使用GLMMadaptive 两部分 model vignette中的数据和 model :
library(GLMMadaptive)
library(emmeans)
# data generating code from the vignette:
{
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 5 # maximum follow-up time
# we construct a data frame with the design:
# everyone has a baseline measurement, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects non-zero part
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ 1, data = DF)
# design matrices for the fixed and random effects zero part
X_zi <- model.matrix(~ sex, data = DF)
Z_zi <- model.matrix(~ 1, data = DF)
betas <- c(1.5, 0.05, 0.05, -0.03) # fixed effects coefficients non-zero part
shape <- 2 # shape/size parameter of the negative binomial distribution
gammas <- c(-1.5, 0.5) # fixed effects coefficients zero part
D11 <- 0.5 # variance of random intercepts non-zero part
D22 <- 0.4 # variance of random intercepts zero part
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor non-zero part
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, 1, drop = FALSE]))
# linear predictor zero part
eta_zi <- as.vector(X_zi %*% gammas + rowSums(Z_zi * b[DF$id, 2, drop = FALSE]))
# we simulate negative binomial longitudinal data
DF$y <- rnbinom(n * K, size = shape, mu = exp(eta_y))
# we set the extra zeros
DF$y[as.logical(rbinom(n * K, size = 1, prob = plogis(eta_zi)))] <- 0
}
#create categorical time variable
DF$time_categorical[DF$time<2.5] <- "early"
DF$time_categorical[DF$time>=2.5] <- "late"
DF$time_categorical <- as.factor(DF$time_categorical)
#model with interaction in fixed effects zero part and adding nesting in zero part as in model above
km3 <- mixed_model(y ~ sex * time_categorical, random = ~ 1 | id, data = DF,
family = hurdle.lognormal(), n_phis = 1,
zi_fixed = ~ sex * time_categorical, zi_random = ~ 1 | id)
#### ATTEMPT at QDRG function in emmeans ####
coef_zero_part <- fixef(km3, sub_model = "zero_part")
vcov_zero_part <- vcov(km3)[9:12,9:12]
qd_km3 <- emmeans::qdrg(formula = ~ sex * time_categorical, data = DF,
coef = coef_zero_part, vcov = vcov_zero_part)
Output: Output:
> joint_tests(qd_km3)
model term df1 df2 F.ratio p.value
sex 1 Inf 11.509 0.0007
time_categorical 1 Inf 0.488 0.4848
sex:time_categorical 1 Inf 1.077 0.2993
> emmeans(qd_km3, pairwise ~ sex|time_categorical)
$emmeans
time_categorical = early:
sex emmean SE df asymp.LCL asymp.UCL
male -1.592 0.201 Inf -1.99 -1.198
female -1.035 0.187 Inf -1.40 -0.669
time_categorical = late:
sex emmean SE df asymp.LCL asymp.UCL
male -1.914 0.247 Inf -2.40 -1.429
female -0.972 0.188 Inf -1.34 -0.605
Confidence level used: 0.95
$contrasts
time_categorical = early:
contrast estimate SE df z.ratio p.value
male - female -0.557 0.270 Inf -2.064 0.0390
time_categorical = late:
contrast estimate SE df z.ratio p.value
male - female -0.942 0.306 Inf -3.079 0.0021
Checking if contrasts correspond with zero-part fixed effects:检查对比是否与零部分固定效应相对应:
> fixef(km3, sub_model = "zero_part")
(Intercept) sexfemale time_categoricallate sexfemale:time_categoricallate
-1.5920415 0.5568072 -0.3220390 0.3849780
> (-1.5920) - (-1.5920 + 0.5568)
[1] -0.5568 #matches contrast within "early" level of "time_categorical"
> (-1.5920 + -0.3220) - (-1.5920 + -0.3220 + 0.5568 + 0.3850)
[1] -0.9418 #matches contrast within "late" level of "time_categorical"
The function emmeans::qdrg()
can sometimes be used to create the needed object for a model not directly supported by emmeans . function
emmeans::qdrg()
有时可用于为 emmeans 不直接支持的 model 创建所需的object 。 See its documentation.请参阅其文档。 In very simple models (eg, inheriting from
lm
, it may be enough to supply the object
and data
arguments.在非常简单的模型中(例如,从
lm
继承,提供object
和data
arguments 可能就足够了。
That usually does not work for more sophisticated models, in which case you will need to specify data
, the fixed-effects formula
for the conditional or zero part of the model, and the associated regression coefficients ( coef
) and variance-covariance matrix ( vcov
) for the part of the model in question.这通常不适用于更复杂的模型,在这种情况下,您需要指定
data
、 model 的条件或零部分的固定效应formula
,以及相关的回归系数 ( coef
) 和方差-协方差矩阵 ( vcov
) 对于有问题的 model 部分。 Often with models like this with multiple components, you likely will have to pick a subset of the coefficients and covariance matrix.通常对于具有多个组件的此类模型,您可能必须选择系数和协方差矩阵的子集。 These all must conform: the length of
coef
must equal the number of rows and columns of vcov
and the number of columns in the model matrix generated by formula
[which may be checked via model.matrix(formula, data = data)
].这些都必须符合:
coef
的长度必须等于vcov
的行数和列数以及由formula
生成的 model 矩阵中的列数[可以通过model.matrix(formula, data = data)
进行检查]。
qdrg()
will not work for a multivariate model -- or at least it's tricky -- because the implied model involves other factor(s) that delineate the levels of the multivariate response. qdrg()
不适用于多变量 model - 或者至少它很棘手 - 因为隐含的 model 涉及描述多变量响应水平的其他因素。 If there are special provisions for, say, spline smoothing, that is another instance where qdrg()
probably can't be made to work.如果有特殊规定,例如样条平滑,那是另一个可能无法使
qdrg()
工作的情况。
Once qdrg()
actually runs and produces results, it is a good idea to use it to estimate some contrasts that are estimated by the model parameterization.一旦
qdrg()
实际运行并产生结果,最好使用它来估计由 model 参数化估计的一些对比。 For example, suppose that the model was fitted with the default contr.treatment
contrasts.例如,假设 model 配备了默认的
contr.treatment
对比。 Then the regression coefficients are interpretable as a comparison with the first level as a reference level.然后回归系数可以解释为与作为参考水平的第一水平的比较。 Accordingly, if we computed
rg <- qdrg(...)
, and one of the factors is "treat"
, look at contrast(rg, "trt.vs.ctrl1", simple = "treat")
, and check to see if the first set of estimated contrasts matches the main-effect estimates for treat
.因此,如果我们计算
rg <- qdrg(...)
,并且其中一个因素是"treat"
,请查看contrast(rg, "trt.vs.ctrl1", simple = "treat")
,然后查看如果第一组估计的对比与treat
的主效应估计相匹配。
I will illustrate all of this with a simple lm
model, ignoring the fact that it is already supported by emmeans .我将用一个简单的
lm
model 来说明所有这些,忽略emmeans已经支持它的事实。
> warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks)
Here is the reference grid这是参考网格
> rg <- qdrg(~ wool * tension, coef = coef(warp.lm), vcov = vcov(warp.lm),
+ df = df.residual(warp.lm), data = warpbreaks)
Here is a sanity check -- First, look at the model summary:这是一个健全性检查——首先,查看 model 摘要:
> summary(warp.lm)$coef
Estimate Std. Error t value Pr(>|t|)
(Intercept) 44.55556 3.646761 12.217842 2.425903e-16
woolB -16.33333 5.157299 -3.167032 2.676803e-03
tensionM -20.55556 5.157299 -3.985721 2.280796e-04
tensionH -20.00000 5.157299 -3.877999 3.199282e-04
woolB:tensionM 21.11111 7.293523 2.894501 5.698287e-03
woolB:tensionH 10.55556 7.293523 1.447251 1.543266e-01
Second, look at selected contrasts:其次,看选定的对比:
> contrast(rg, "trt.vs.ctrl1", simple = "wool")
tension = L:
contrast estimate SE df t.ratio p.value
B - A -16.33 5.16 48 -3.167 0.0027
tension = M:
contrast estimate SE df t.ratio p.value
B - A 4.78 5.16 48 0.926 0.3589
tension = H:
contrast estimate SE df t.ratio p.value
B - A -5.78 5.16 48 -1.120 0.2682
> contrast(rg, "trt.vs.ctrl1", simple = "tension")
wool = A:
contrast estimate SE df t.ratio p.value
M - L -20.556 5.16 48 -3.986 0.0005
H - L -20.000 5.16 48 -3.878 0.0006
wool = B:
contrast estimate SE df t.ratio p.value
M - L 0.556 5.16 48 0.108 0.9863
H - L -9.444 5.16 48 -1.831 0.1338
P value adjustment: dunnettx method for 2 tests
Comparing with the regression coefficients, we do confirm that the first contrast for wool
is estimated as -16.33, matching the regression coefficient for woolB
.与回归系数相比,我们确实确认了
wool
的第一个对比估计为woolB
,与羊毛B 的回归系数相匹配。 Also, the first set of contrasts for tension
are estimated as -20.556 and -20.0, matching the regression coefficients for tensionM
and tensionH
.此外,
tension
的第一组对比估计为tensionM
和 -20.0,与张力 M 和张力 H 的回归系数相tensionH
。 The SEs and t ratios match as well. SEs 和t比率也匹配。 (The P values for the second set do not match due to the multiplicity adjustment.)
(由于多重性调整,第二组的P值不匹配。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.