[英]Can I pool imputed random effect model estimates using the mi package?
It appears that the mi
package has had a pretty big rewrite at some point within the past couple of years. 似乎
mi
包在过去几年中在某个时刻进行了相当大的重写。
The "old" way of doing things is well-outlined in the following tutorial: http://thomasleeper.com/Rcourse/Tutorials/mi.html 以下教程中详细介绍了“旧”做事方式: http : //thomasleeper.com/Rcourse/Tutorials/mi.html
The "new" way of doing things (sticking with Leeper's simulation demo) looks something like this: 做事的“新”方式(坚持使用Leeper的模拟演示)看起来像这样:
#load mi
library(mi)
#set seed
set.seed(10)
#simulate some data (with some observations missing)
x1 <- runif(100, 0, 5)
x2 <- rnorm(100)
y <- 2*x1 + 20*x2 + rnorm(100)
mydf <- cbind.data.frame(x1, x2, y)
mydf$x1[sample(1:nrow(mydf), 20, FALSE)] <- NA
mydf$x2[sample(1:nrow(mydf), 10, FALSE)] <- NA
# Convert to a missing_data.frame
mydf_mdf <- missing_data.frame(mydf)
# impute
mydf_imp <- mi(mydf_mdf)
Although function names have changed, this is actually pretty similar to the "old" way of doing things. 虽然功能名称已经改变,但这实际上与“旧”的做事方式非常相似。
The biggest change (from my vantage) is the replacement of the following "old" functions 最大的变化(从我的优势)是替换以下“旧”功能
lm.mi(formula, mi.object, ...)
glm.mi(formula, mi.object, family = gaussian, ...)
bayesglm.mi(formula, mi.object, family = gaussian, ...)
polr.mi(formula, mi.object, ...)
bayespolr.mi(formula, mi.object, ...)
lmer.mi(formula, mi.object, rescale=FALSE, ...)
glmer.mi(formula, mi.object, family = gaussian, rescale=FALSE, ...)
. glmer.mi(formula, mi.object, family = gaussian, rescale=FALSE, ...)
。
Previously, a user could compute a model for each imputed dataset using one of these functions and then pool the results using mi.pooled()
(or coef.mi()
if we are following the Leeper example). 以前,用户可以使用其中一个函数为每个插补数据集计算模型,然后使用
mi.pooled()
(或者如果我们遵循Leeper示例, coef.mi()
结果。
In the current version of mi
(I have v1.0 installed), these last steps appear to have been combined into a single function, pool()
. 在当前版本的
mi
(我安装了v1.0)中,这些最后的步骤似乎已合并为一个函数pool()
。 The pool()
function appears to read the family and link function that was assigned to a variable during the imputation process above and then estimate a model with bayesglm
using the specified formula as shown below. pool()
函数似乎读取在上面的插补过程中分配给变量的族和链接函数,然后使用指定的公式估计具有bayesglm
的模型,如下所示。
# run models on imputed data and pool the results
summary(pool(y ~ x1 + x2, mydf_imp))
##
## Call:
## pool(formula = y ~ x1 + x2, data = mydf_imp)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.98754 -0.40923 0.03393 0.46734 2.13848
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.34711 0.25979 -1.336 0.215
## x1 2.07806 0.08738 23.783 1.46e-13 ***
## x2 19.90544 0.11068 179.844 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.7896688)
##
## Null deviance: 38594.916 on 99 degrees of freedom
## Residual deviance: 76.598 on 97 degrees of freedom
## AIC: 264.74
##
## Number of Fisher Scoring iterations: 7
This looks like we are coming close to recovering our simulated beta values (2 and 20). 这看起来我们即将恢复我们的模拟beta值(2和20)。 In other words, it is behaving as expected.
换句话说,它表现得像预期的那样。
Let's take a slightly larger set of data with a naively simulated random effect just for the sake of getting a grouping variable. 让我们采用一个稍微大一点的数据集,只是为了获得一个分组变量而采用天真的模拟随机效应。
mydf2 <- data.frame(x1 = rep(runif(100, 0, 5), 20)
,x2 = rep(rnorm(100, 0, 2.5), 20)
,group_var = rep(1:20, each = 100)
,noise = rep(rnorm(100), 20))
mydf2$y <- 2*mydf2$x1 + 20*mydf2$x2 + mydf2$noise
mydf2$x1[sample(1:nrow(mydf2), 200, FALSE)] <- NA
mydf2$x2[sample(1:nrow(mydf2), 100, FALSE)] <- NA
# Convert to a missing_data.frame
mydf2_mdf <- missing_data.frame(mydf2)
show(mydf2_mdf)
## Object of class missing_data.frame with 2000 observations on 5 variables
##
## There are 4 missing data patterns
##
## Append '@patterns' to this missing_data.frame to access the corresponding pattern for every observation or perhaps use table()
##
## type missing method model
## x1 continuous 200 ppd linear
## x2 continuous 100 ppd linear
## group_var continuous 0 <NA> <NA>
## noise continuous 0 <NA> <NA>
## y continuous 0 <NA> <NA>
##
## family link transformation
## x1 gaussian identity standardize
## x2 gaussian identity standardize
## group_var <NA> <NA> standardize
## noise <NA> <NA> standardize
## y <NA> <NA> standardize
Since missing_data.frame()
appears to be intepreting group_var
as continuous, I use the change()
function from mi
to reassign to "un"
for "unordered categorical" and then proceed as above. 由于
missing_data.frame()
似乎是将group_var group_var
为连续的,因此我使用mi
的change()
函数为“unordered categorical”重新分配给"un"
”,然后按上述步骤操作。
mydf2_mdf <- change(mydf2_mdf, y = "group_var", what = "type", to = "un" )
# impute
mydf2_imp <- mi(mydf2_mdf)
Now, unless version 1.0 of mi
has removed the functionality of previous versions (ie functionality available with lmer.mi
and glmer.mi
), I would assume that the addition of a random effect in the formula should point pool()
to the appropriate lme4
function. 现在,除非
mi
1.0版已经删除了以前版本的功能(即lmer.mi
和glmer.mi
提供的功能),我会假设在公式中添加随机效应应该将pool()
指向适当的lme4
功能。 However, the initial error message suggests that this is not the case. 但是,初始错误消息表明情况并非如此。
# run models on imputed data and pool the results
summary(pool(y ~ x1 + x2 + (1|group_var), mydf2_imp))
## Warning in Ops.factor(1, group_var): '|' not meaningful for factors
## Warning in Ops.factor(1, group_var): '|' not meaningful for factors
## Error in if (prior.scale[j] < min.prior.scale) {: missing value where TRUE/FALSE needed
Following my warning message and extracting the integers out of my factor does get me an estimate, but the results suggest that pool()
is still estimating a fixed-effect model with bayesglm
and holding my attempted random-effect constant. 根据我的警告信息并从我的因子中提取整数确实得到了我的估计,但结果表明
pool()
仍在估计具有bayesglm
的固定效应模型并保持我尝试的随机效应常数。
summary(pool(y ~ x1 + x2 + (1|as.numeric(as.character(group_var))), mydf2_imp))
##
## Call:
## pool(formula = y ~ x1 + x2 + (1 | as.numeric(as.character(group_var))),
## data = mydf2_imp)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.93633 -0.69923 0.01073 0.56752 2.12167
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 1.383e-01 2.596e+02 0.001
## x1 1.995e+00 1.463e-02 136.288
## x2 2.000e+01 8.004e-03 2499.077
## 1 | as.numeric(as.character(group_var))TRUE -3.105e-08 2.596e+02 0.000
## Pr(>|t|)
## (Intercept) 1
## x1 <2e-16 ***
## x2 <2e-16 ***
## 1 | as.numeric(as.character(group_var))TRUE 1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.8586836)
##
## Null deviance: 5384205.2 on 1999 degrees of freedom
## Residual deviance: 1713.9 on 1996 degrees of freedom
## AIC: 5377
##
## Number of Fisher Scoring iterations: 4
My questions are: 我的问题是:
mi
?, and mi
?和?来轻松生成合并的随机效应估计值 Just to provide an alternative, there's a package that focuses quite a bit on MI for mixed-effects models as well as pooling the results obtained from it ( mitml
, find it here ). 只是提供一个替代方案,有一个包关注混合效果模型的MI,以及汇总从它获得的结果(
mitml
, 在这里找到它 )。
Using the package is pretty straightforward. 使用该包非常简单。 It relies on the packages
pan
and jomo
for imputation, but it can also handle input from other MI packages ( ?as.mitml.list
). 它依赖于包
pan
和jomo
进行插补,但它也可以处理来自其他MI包的输入( ?as.mitml.list
)。
Pooling estimates from a mixed-effects model is mostly automatized and included in the testEstimates
function. 混合效应模型的汇总估算大多是自动化的,并包含在
testEstimates
函数中。
require(mitml)
require(lme4)
data(studentratings)
# impute example data using 'pan'
fml <- ReadDis + SES ~ ReadAchiev + (1|ID)
imp <- panImpute(studentratings, formula=fml, n.burn=1000, n.iter=100, m=5)
implist <- mitmlComplete(imp, print=1:5)
# fit model using lme4
fit.lmer <- with(implist, lmer(SES ~ (1|ID)))
# pool results using 'Rubin's rules'
testEstimates(fit.lmer, var.comp=TRUE)
Output: 输出:
# Call:
# testEstimates(model = fit.lmer, var.comp = TRUE)
# Final parameter estimates and inferences obtained from 5 imputed data sets.
# Estimate Std.Error t.value df p.value RIV FMI
# (Intercept) 46.988 1.119 41.997 801.800 0.000 0.076 0.073
# Estimate
# Intercept~~Intercept|ID 38.272
# Residual~~Residual 298.446
# ICC|ID 0.114
# Unadjusted hypothesis test as appropriate in larger samples.
You can specify the FUN
argument to the pool()
function to change the estimator. 您可以为
pool()
函数指定FUN
参数以更改估算器。 In your case, it would be summary(pool(y ~ x1 + x2 + (1|as.numeric(as.character(group_var))), data = mydf2_imp, FUN = lmer))
. 在你的情况下,它将是
summary(pool(y ~ x1 + x2 + (1|as.numeric(as.character(group_var))), data = mydf2_imp, FUN = lmer))
。 That may or may not actually work, but it is legal syntax. 这可能会或可能不会实际工作,但它是合法的语法。 If that fails, then you can use the
complete
function to created completed data.frames, call lmer
on each, and average the results yourself, which would be something like dfs <- complete(mydf2_imp) estimates <- lapply(dfs, FUN = lme4, formula = y ~ x1 + x2 + (1|as.numeric(as.character(group_var)))) rowMeans(sapply(estimates, FUN = fixef))
如果失败了,那么你可以使用
complete
函数创建完成的data.frames,在每个上调用lmer
,并自己平均结果,这就像dfs <- complete(mydf2_imp) estimates <- lapply(dfs, FUN = lme4, formula = y ~ x1 + x2 + (1|as.numeric(as.character(group_var)))) rowMeans(sapply(estimates, FUN = fixef))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.