[英]Random Effects in Longitudinal Multilevel Imputation Models Using MICE
I am trying to impute data in dataset with a longitudinal design.我正在尝试使用纵向设计来估算数据集中的数据。 There are two predictors (experimental group, and time) and one outcome variable (score).
有两个预测变量(实验组和时间)和一个结果变量(分数)。 The clustering variable is id.
聚类变量是 id。
Here is the toy data这是玩具数据
set.seed(345)
A0 <- rnorm(4,2,.5)
B0 <- rnorm(4,2+3,.5)
A1 <- rnorm(4,6,.5)
B1 <- rnorm(4,6+2,.5)
A2 <- rnorm(4,10,.5)
B2 <- rnorm(4,10+1,.5)
A3 <- rnorm(4,14,.5)
B3 <- rnorm(4,14+0,.5)
score <- c(A0,B0,A1,B1,A2,B2,A3,B3)
id <- rep(1:8,times = 4, length = 32)
time <- rep(0:3, each = 8, length = 32)
group <- rep(c("A","B"), times =2, each = 4, length = 32)
df <- data.frame(id = id, group = group, time = time, score = score)
# plots
(ggplot(df, aes(x = time, y = score, group = group)) +
stat_summary(fun.y = "mean", geom = "line", aes(linetype = group)) +
stat_summary(fun.y = "mean", geom = "point", aes(shape = group), size = 3) +
coord_cartesian(ylim = c(0,18)))
# now place some NAs
df[sample(1:nrow(df), 10, replace = F),"score"] <- NA
df
If I understand this post correctly, in the predictor matrix I should specify the id
clustering variable with a -2
and the two fixed predictors time
and group
with a 1
.如果我理解此篇正确,在预测器矩阵I应该指定
id
与聚类变量-2
和两个固定的预测time
和group
带1
。 Like so像这样
library(mice)
(ini <- mice(df, maxit=0))
(pred <- ini$predictorMatrix)
(pred["score",] <- c(-2, 1, 1, 0))
(imp <- mice(df,
method = c("", "", "", "2l.pan"),
pred = pred,
maxit = 1,
seed = 71152))
What i would like to know is:我想知道的是:
-2
designates it as a 'class' variable, but in this mice primer it suggests that for multilevel models you should create a variable of all 1
's in the dataframe as a constant, which is then specified as the random intercept via 2
in the predictor matrix.-2
将其指定为“类”变量,但在此鼠标入门中,它建议对于多级模型,您应该在数据框中创建一个全为1
的变量作为常量,然后将其指定为通过预测矩阵中的2
随机截取。 However, this is based on the 2l.norm
function rather than the 2l.pan
function, so I am not really sure where I am here.2l.norm
函数而不是2l.pan
函数,所以我不太确定我在这里的位置。 Does the 2l.pan
function not require this column, or the specification of random effects? 2l.pan
函数是否不需要此列或随机效应的规范?The pan
library doesn't require an intercept term. pan
库不需要拦截项。
You can dig into the function using您可以使用
library(pan)
?pan
That said mice
uses a wrapper around pan called mice.impute.2l.pan
with the mice
library loaded you can look at the help for that function.也就是说,
mice
使用一个名为mice.impute.2l.pan
pan 包装器,并加载了mice
库,您可以查看该函数的帮助。 It states: it has a parameters called intercept
which is [a] Logical [and] determin[es] whether the intercept is automatically added.
它指出:它有一个称为
intercept
的参数,它是[a] Logical [and] determin[es] whether the intercept is automatically added.
It is TRUE by default.默认情况下为 TRUE。 This is defined as a random intercept by default.
默认情况下,这被定义为随机拦截。 Found this out after browsing the R code for the mice wrapper:
在浏览鼠标包装器的 R 代码后发现了这一点:
if (intercept) { x <- cbind(1, as.matrix(x)) type <- c(2, type) }
Where the pan
function parameter type
is a Vector of length ncol(x) identifying random and class variables
.其中
pan
函数参数type
是一个Vector of length ncol(x) identifying random and class variables
的Vector of length ncol(x) identifying random and class variables
。 The intercept is added by default and defined as a random effect.默认情况下添加截距并定义为随机效应。
They do provide and example like you stated with a 1 for "x" in the prediction matrix for fixed effects.他们确实提供了一个例子,就像你在固定效应的预测矩阵中用 1 表示“x”一样。
It also states for 2l.norm
, The random intercept is automatically added in mice.impute.2l.norm().
它还声明
2l.norm
, The random intercept is automatically added in mice.impute.2l.norm().
It has a few examples with descriptions.它有一些带有描述的示例。 The CRAN documentation for
pan
might help you. pan
的 CRAN 文档可能对您有所帮助。
This answer is probably a bit late for you, but it may be able to help some people who read this in the future:这个答案对你来说可能有点晚了,但它可能会帮助一些未来阅读这篇文章的人:
2l.pan
2l.pan
Below are some details about specifying multilevel imputation models with mice
.以下是有关使用
mice
指定多级插补模型的一些详细信息。 Because the application is longitudinal, I use the term "persons" to refer to units at Level 2. These are the most relevant arguments for 2l.pan
as mentioned in the mice
documentation:因为应用程序是纵向的,我使用术语“人”来指代级别 2 的单位。这些是
2l.pan
最相关的参数,如mice
文档中所述:
type
Vector of length
ncol(x)
identifying random and class variables.识别随机变量和类变量的长度为
ncol(x)
向量。 Random effects are identified by a2
.随机效应由
2
标识。 The group variable (only one is allowed) is coded as-2
.组变量(只允许一个)被编码为
-2
。 Random effects also include the fixed effect.随机效应还包括固定效应。 If for a covariates
X1
group means shall be calculated and included as further fixed effects choose3
.如果对于协变量
X1
组均值应计算并包括为进一步的固定效应,则选择3
。 In addition to the effects in3
, specification4
also includes random effects ofX1
.除了
3
的效应,规范4
还包括X1
随机效应。
There are 5 different codes you can use in the predictor matrix for variables imputed with 2l.pan
.您可以在预测矩阵中使用 5 种不同的代码,用于使用
2l.pan
插补的变量。 The person identifier is coded as -2
(this is different from 2l.norm
).人员标识符编码为
-2
(这与2l.norm
不同)。 To include predictor variables with fixed or random effects, these variables are coded with 1
or 2
, respectively.为了包括具有固定或随机效应的预测变量,这些变量分别用
1
或2
编码。 If coded as 2
, the corresponding fixed effect is automatically included.如果编码为
2
,则自动包含相应的固定效果。
In addition, 2l.pan
offers the codes 3
and 4
, which have similar meanings as 1
and 2
but will include an additional fixed effect for the person mean of that variable.此外,
2l.pan
提供代码3
和4
,它们与1
和2
具有相似的含义,但将包括对该变量的个人均值的附加固定效应。 This is useful if you're trying to model within- and between-person effects of time-varying predictor variables.如果您尝试对时变预测变量的人内和人际效应进行建模,这将非常有用。
intercept
Logical determining whether the intercept is automatically added.
逻辑判断是否自动添加拦截。
By default, 2l.pan
includes the intercept as both a fixed and a random effect.默认情况下,
2l.pan
包括作为固定和随机效果的截距。 For this reason, it is not required to include a constant term in the predictor matrix.因此,不需要在预测矩阵中包含常数项。 If one sets
intercept=FALSE
, this behavior is changed, and the intercept is dropped from the imputation model.如果设置
intercept=FALSE
,则此行为会更改,并且从插补模型中删除了截距。
groupcenter.slope
If
TRUE
, in case of group means (type
is3
or4
) group mean centering for these predictors are conducted before doing imputations.如果为
TRUE
,则在组均值(type
为3
或4
)的情况下,在进行插补之前对这些预测变量进行组均值居中。 Default isFALSE
.默认值为
FALSE
。
Using this option, it is possible to center predictor variables around the person mean instead of including the predictor variable "as is" (ie, without centering).使用此选项,可以将预测变量集中在人的均值周围,而不是“按原样”包括预测变量(即,不居中)。 This only applies to variables coded as
3
or 4
.这仅适用于编码为
3
或4
变量。 For predictors coded as 3
, this is not very important because the models with and without centering are identical.对于编码为
3
预测变量,这不是很重要,因为有和没有居中的模型是相同的。
However, when predictor variables are coded as 4
(ie, with a random slope), then centering alters the meaning of the random effect so that the random slope no longer applies to the variable "as is" but to the within-person deviation of that variable.然而,当预测变量被编码为
4
(即具有随机斜率)时,中心化会改变随机效应的含义,因此随机斜率不再适用于“原样”变量,而是适用于人内偏差那个变量。
In your example, you can include a simple random slope for time
as follows:在您的示例中,您可以包含一个简单的
time
随机斜率,如下所示:
library(mice)
ini <- mice(df, maxit=0)
# predictor matrix (following 'type')
pred <- ini$predictorMatrix
pred["score",] <- c(-2, 1, 2, 0)
# imputation method
meth <- c("", "", "", "2l.pan")
imp <- mice(df, method=meth, pred=pred, maxit=10, m=10)
In this example, coding time
as 3
or 4
wouldn't make a lot of sense because the person means of time
are identical for all persons.在这个例子中,编码
time
为3
或4
没有多大意义,因为人的time
手段对所有人来说都是相同的。 However, if you have time-varying covariates that you want to include as predictor variables in the imputation model, 3
and 4
can be useful.但是,如果您希望将时变协变量作为预测变量包含在插补模型中,则
3
和4
可能很有用。
The additional arguments like intercept
and groupcenter.slope
can be specified directly in the call to mice()
, for example:可以在调用
groupcenter.slope
mice()
直接指定诸如intercept
和groupcenter.slope
类的附加参数,例如:
imp <- mice(df, ..., groupcenter.slope=TRUE)
So, to answer your questions as stated in the post:因此,按照帖子中的说明回答您的问题:
Yes, 2l.pan
provides a multilevel (or rather two-level) imputation model.是的,
2l.pan
提供了一个多级(或者更确切地说是两级)插补模型。 The intercept is included as both a fixed and a random effect by default (can be changed with intercept=FALSE
) and need not be specified in the predictor matrix (this is in contrast to 2l.norm
).默认情况下,截距作为固定和随机效应包括在内(可以用
intercept=FALSE
更改)并且不需要在预测矩阵中指定(这与2l.norm
形成对比)。
Yes, you can specify random slopes with 2l.pan
.是的,您可以使用
2l.pan
指定随机斜率。 To do that, predictors with random slopes are coded as 2
or 4
in the predictor matrix.为此,具有随机斜率的预测变量在预测变量矩阵中编码为
2
或4
。 If coded as 2
, the random slope is included.如果编码为
2
,则包括随机斜率。 If coded as 4
, the random slope is included as well as an additional fixed effect for the person means of that variable.如果编码为
4
,则包括随机斜率以及该变量的个人均值的附加固定效应。 If coded as 4
, the meaning of the random slope may be altered by making use of groupcenter.slope=TRUE
(see above).如果编码为
4
,则可以通过使用groupcenter.slope=TRUE
(见上文)来改变随机斜率的含义。
This article also includes some worked examples for how to work with 2l.pan
and other functions for mutlivel imputation: [Link]本文还包括一些关于如何使用
2l.pan
和其他函数进行多重插补的工作示例: [链接]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.