简体   繁体   English

我可以使用mi包汇集推算的随机效应模型估算值吗?

[英]Can I pool imputed random effect model estimates using the mi package?

It appears that the mi package has had a pretty big rewrite at some point within the past couple of years. 似乎mi包在过去几年中在某个时刻进行了相当大的重写。

The "old" way of doing things is well-outlined in the following tutorial: http://thomasleeper.com/Rcourse/Tutorials/mi.html 以下教程中详细介绍了“旧”做事方式: http//thomasleeper.com/Rcourse/Tutorials/mi.html

The "new" way of doing things (sticking with Leeper's simulation demo) looks something like this: 做事的“新”方式(坚持使用Leeper的模拟演示)看起来像这样:

#load mi
library(mi)
#set seed
set.seed(10)
#simulate some data (with some observations missing)
x1 <- runif(100, 0, 5)
x2 <- rnorm(100)
y <- 2*x1 + 20*x2 + rnorm(100)
mydf <- cbind.data.frame(x1, x2, y)
mydf$x1[sample(1:nrow(mydf), 20, FALSE)] <- NA
mydf$x2[sample(1:nrow(mydf), 10, FALSE)] <- NA

# Convert to a missing_data.frame
mydf_mdf <- missing_data.frame(mydf)

# impute
mydf_imp <- mi(mydf_mdf)

Although function names have changed, this is actually pretty similar to the "old" way of doing things. 虽然功能名称已经改变,但这实际上与“旧”的做事方式非常相似。

The biggest change (from my vantage) is the replacement of the following "old" functions 最大的变化(从我的优势)是替换以下“旧”功能

lm.mi(formula, mi.object, ...)

glm.mi(formula, mi.object, family = gaussian, ...)

bayesglm.mi(formula, mi.object, family = gaussian, ...)

polr.mi(formula, mi.object, ...)

bayespolr.mi(formula, mi.object, ...)

lmer.mi(formula, mi.object, rescale=FALSE, ...)

glmer.mi(formula, mi.object, family = gaussian, rescale=FALSE, ...) . glmer.mi(formula, mi.object, family = gaussian, rescale=FALSE, ...)

Previously, a user could compute a model for each imputed dataset using one of these functions and then pool the results using mi.pooled() (or coef.mi() if we are following the Leeper example). 以前,用户可以使用其中一个函数为每个插补数据集计算模型,然后使用mi.pooled() (或者如果我们遵循Leeper示例, coef.mi()结果。

In the current version of mi (I have v1.0 installed), these last steps appear to have been combined into a single function, pool() . 在当前版本的mi (我安装了v1.0)中,这些最后的步骤似乎已合并为一个函数pool() The pool() function appears to read the family and link function that was assigned to a variable during the imputation process above and then estimate a model with bayesglm using the specified formula as shown below. pool()函数似乎读取在上面的插补过程中分配给变量的族和链接函数,然后使用指定的公式估计具有bayesglm的模型,如下所示。

# run models on imputed data and pool the results
summary(pool(y ~ x1 + x2, mydf_imp))

## 
## Call:
## pool(formula = y ~ x1 + x2, data = mydf_imp)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.98754  -0.40923   0.03393   0.46734   2.13848  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.34711    0.25979  -1.336    0.215    
## x1           2.07806    0.08738  23.783 1.46e-13 ***
## x2          19.90544    0.11068 179.844  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.7896688)
## 
##     Null deviance: 38594.916  on 99  degrees of freedom
## Residual deviance:    76.598  on 97  degrees of freedom
## AIC: 264.74
## 
## Number of Fisher Scoring iterations: 7

This looks like we are coming close to recovering our simulated beta values (2 and 20). 这看起来我们即将恢复我们的模拟beta值(2和20)。 In other words, it is behaving as expected. 换句话说,它表现得像预期的那样。

Let's take a slightly larger set of data with a naively simulated random effect just for the sake of getting a grouping variable. 让我们采用一个稍微大一点的数据集,只是为了获得一个分组变量而采用天真的模拟随机效应。

mydf2 <- data.frame(x1 = rep(runif(100, 0, 5), 20)
                   ,x2 = rep(rnorm(100, 0, 2.5), 20)
                   ,group_var = rep(1:20, each = 100)
                   ,noise = rep(rnorm(100), 20))

mydf2$y <- 2*mydf2$x1 + 20*mydf2$x2 + mydf2$noise

mydf2$x1[sample(1:nrow(mydf2), 200, FALSE)] <- NA
mydf2$x2[sample(1:nrow(mydf2), 100, FALSE)] <- NA

# Convert to a missing_data.frame
mydf2_mdf <- missing_data.frame(mydf2)

show(mydf2_mdf)

## Object of class missing_data.frame with 2000 observations on 5 variables
## 
## There are 4 missing data patterns
## 
## Append '@patterns' to this missing_data.frame to access the corresponding pattern for every observation or perhaps use table()
## 
##                 type missing method  model
## x1        continuous     200    ppd linear
## x2        continuous     100    ppd linear
## group_var continuous       0   <NA>   <NA>
## noise     continuous       0   <NA>   <NA>
## y         continuous       0   <NA>   <NA>
## 
##             family     link transformation
## x1        gaussian identity    standardize
## x2        gaussian identity    standardize
## group_var     <NA>     <NA>    standardize
## noise         <NA>     <NA>    standardize
## y             <NA>     <NA>    standardize

Since missing_data.frame() appears to be intepreting group_var as continuous, I use the change() function from mi to reassign to "un" for "unordered categorical" and then proceed as above. 由于missing_data.frame()似乎是将group_var group_var为连续的,因此我使用michange()函数为“unordered categorical”重新分配给"un" ”,然后按上述步骤操作。

mydf2_mdf <- change(mydf2_mdf, y = "group_var", what = "type", to = "un"  )

# impute
mydf2_imp <- mi(mydf2_mdf)

Now, unless version 1.0 of mi has removed the functionality of previous versions (ie functionality available with lmer.mi and glmer.mi ), I would assume that the addition of a random effect in the formula should point pool() to the appropriate lme4 function. 现在,除非mi 1.0版已经删除了以前版本的功能(即lmer.miglmer.mi提供的功能),我会假设在公式中添加随机效应应该将pool()指向适当的lme4功能。 However, the initial error message suggests that this is not the case. 但是,初始错误消息表明情况并非如此。

# run models on imputed data and pool the results
summary(pool(y ~ x1 + x2 + (1|group_var), mydf2_imp))
## Warning in Ops.factor(1, group_var): '|' not meaningful for factors
## Warning in Ops.factor(1, group_var): '|' not meaningful for factors
## Error in if (prior.scale[j] < min.prior.scale) {: missing value where TRUE/FALSE needed

Following my warning message and extracting the integers out of my factor does get me an estimate, but the results suggest that pool() is still estimating a fixed-effect model with bayesglm and holding my attempted random-effect constant. 根据我的警告信息并从我的因子中提取整数确实得到了我的估计,但结果表明pool()仍在估计具有bayesglm的固定效应模型并保持我尝试的随机效应常数。

summary(pool(y ~ x1 + x2 + (1|as.numeric(as.character(group_var))), mydf2_imp))

## 
## Call:
## pool(formula = y ~ x1 + x2 + (1 | as.numeric(as.character(group_var))), 
##     data = mydf2_imp)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.93633  -0.69923   0.01073   0.56752   2.12167  
## 
## Coefficients:
##                                               Estimate Std. Error  t value
## (Intercept)                                  1.383e-01  2.596e+02    0.001
## x1                                           1.995e+00  1.463e-02  136.288
## x2                                           2.000e+01  8.004e-03 2499.077
## 1 | as.numeric(as.character(group_var))TRUE -3.105e-08  2.596e+02    0.000
##                                             Pr(>|t|)    
## (Intercept)                                        1    
## x1                                            <2e-16 ***
## x2                                            <2e-16 ***
## 1 | as.numeric(as.character(group_var))TRUE        1    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.8586836)
## 
##     Null deviance: 5384205.2  on 1999  degrees of freedom
## Residual deviance:    1713.9  on 1996  degrees of freedom
## AIC: 5377
## 
## Number of Fisher Scoring iterations: 4

My questions are: 我的问题是:

  1. Is it possible to easily generate pooled random effects estimates using mi ?, and 是否有可能使用mi ?和?来轻松生成合并的随机效应估计值
  2. If yes, how? 如果有,怎么样?

Just to provide an alternative, there's a package that focuses quite a bit on MI for mixed-effects models as well as pooling the results obtained from it ( mitml , find it here ). 只是提供一个替代方案,有一个包关注混合效果模型的MI,以及汇总从它获得的结果( mitml在这里找到它 )。

Using the package is pretty straightforward. 使用该包非常简单。 It relies on the packages pan and jomo for imputation, but it can also handle input from other MI packages ( ?as.mitml.list ). 它依赖于包panjomo进行插补,但它也可以处理来自其他MI包的输入( ?as.mitml.list )。

Pooling estimates from a mixed-effects model is mostly automatized and included in the testEstimates function. 混合效应模型的汇总估算大多是自动化的,并包含在testEstimates函数中。

require(mitml)
require(lme4)

data(studentratings)

# impute example data using 'pan'
fml <- ReadDis + SES ~ ReadAchiev + (1|ID)
imp <- panImpute(studentratings, formula=fml, n.burn=1000, n.iter=100, m=5)

implist <- mitmlComplete(imp, print=1:5)

# fit model using lme4
fit.lmer <- with(implist, lmer(SES ~ (1|ID)))

# pool results using 'Rubin's rules'
testEstimates(fit.lmer, var.comp=TRUE)

Output: 输出:

# Call:

# testEstimates(model = fit.lmer, var.comp = TRUE)

# Final parameter estimates and inferences obtained from 5 imputed data sets.

#              Estimate Std.Error   t.value        df   p.value       RIV       FMI 
# (Intercept)    46.988     1.119    41.997   801.800     0.000     0.076     0.073 

#                         Estimate 
# Intercept~~Intercept|ID   38.272 
# Residual~~Residual       298.446 
# ICC|ID                     0.114 

# Unadjusted hypothesis test as appropriate in larger samples. 

You can specify the FUN argument to the pool() function to change the estimator. 您可以为pool()函数指定FUN参数以更改估算器。 In your case, it would be summary(pool(y ~ x1 + x2 + (1|as.numeric(as.character(group_var))), data = mydf2_imp, FUN = lmer)) . 在你的情况下,它将是summary(pool(y ~ x1 + x2 + (1|as.numeric(as.character(group_var))), data = mydf2_imp, FUN = lmer)) That may or may not actually work, but it is legal syntax. 这可能会或可能不会实际工作,但它是合法的语法。 If that fails, then you can use the complete function to created completed data.frames, call lmer on each, and average the results yourself, which would be something like dfs <- complete(mydf2_imp) estimates <- lapply(dfs, FUN = lme4, formula = y ~ x1 + x2 + (1|as.numeric(as.character(group_var)))) rowMeans(sapply(estimates, FUN = fixef)) 如果失败了,那么你可以使用complete函数创建完成的data.frames,在每个上调用lmer ,并自己平均结果,这就像dfs <- complete(mydf2_imp) estimates <- lapply(dfs, FUN = lme4, formula = y ~ x1 + x2 + (1|as.numeric(as.character(group_var)))) rowMeans(sapply(estimates, FUN = fixef))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM