[英]Automatically compare nested models from mice's glm.mids
I have a multiply-imputed model from R's mice
package in which there are lots of factor variables. 我有一个来自R's
mice
包装的多重插补模型,其中有很多因子变量。 For example: 例如:
library(mice)
library(Hmisc)
# turn all the variables into factors
fake = nhanes
fake$age = as.factor(nhanes$age)
fake$bmi = cut2(nhanes$bmi, g=3)
fake$chl = cut2(nhanes$chl, g=3)
head(fake)
age bmi hyp chl
1 1 <NA> NA <NA>
2 2 [20.4,25.5) 1 [187,206)
3 1 <NA> 1 [187,206)
4 3 <NA> NA <NA>
5 1 [20.4,25.5) 1 [113,187)
6 3 <NA> NA [113,187)
imput = mice(nhanes)
# big model
fit1 = glm.mids((hyp==2) ~ age + bmi + chl, data=imput, family = binomial)
I want to test the significance of each entire factor variable in the model (not the indicator variables for each level) by testing the full model against each possible nested model that drops one variable at a time. 我想对在某时刻下降一个变量每个可能的嵌套模型中测试完整模型来测试模型(不是每个级别的指标变量)的每个整因子变量的意义。 Manually, I can do:
手动,我可以做:
# small model (no chl)
fit2 = glm.mids((hyp==2) ~ age + bmi, data=imput, family = binomial)
# extract p-value from pool.compare
pool.compare(fit1, fit2)$pvalue
How can I do this automatically for all the factor variables in my model? 如何为模型中的所有因子变量自动执行此操作? The very helpful function
drop1
was suggested to me for a previous question -- now I want to do something exactly like that except for the mice
case. 我对上一个问题建议了非常有用的功能
drop1
,现在我想做的事情完全一样,除了mice
案。
Possibly helpful note: An annoying feature of pool.compare
is that it appears to want the "extra" variables in the larger model to be placed after the ones that are shared with the smaller model. 可能有用的注释:
pool.compare
一个令人讨厌的功能是它似乎希望将较大模型中的“额外”变量放置在与较小模型共享的变量之后。
You can use a loop to iterate through the different combinations of predictors, after arranging them in the order required for pool.compare
. 在按
pool.compare
所需的顺序排列预测变量的不同组合之后,可以使用循环来迭代它们。
So using your fake
data from above - tweaked the number of categories 因此,从上方使用您的
fake
数据-调整了类别数
library(mice)
library(Hmisc)
# turn all the variables into factors
# turn all the variables into factors
fake <- nhanes
fake$age <- as.factor(nhanes$age)
fake$bmi <- cut2(nhanes$bmi, g=2)
fake$chl <- cut2(nhanes$chl, g=2)
# Impute
imput <- mice(fake, seed=1)
# Create models
# - reduced models with one variable removed
# - full models with extra variables at end of expression
vars <- c("age", "bmi", "chl")
red <- combn(vars, length(vars)-1 , simplify=FALSE)
diffs <- lapply(red, function(i) setdiff(vars, i) )
(full <- lapply(1:length(red), function(i)
paste(c(red[[i]], diffs[[i]]), collapse=" + ")))
#[[1]]
#[1] "age + bmi + chl"
#[[2]]
#[1] "age + chl + bmi"
#[[3]]
#[1] "bmi + chl + age"
(red <- combn(vars, length(vars)-1 , FUN=paste, collapse=" + "))
#[1] "age + bmi" "age + chl" "bmi + chl"
The models are now in the correct order to pass to the glm
call. 现在,这些模型以正确的顺序传递给
glm
调用。 I've also replaced glm.mids
method as it has been replaced by with.mids
- see ?glm.mids
我还替换了
glm.mids
方法,因为它已被替换为with.mids
请参阅?glm.mids
out <- vector("list", length(red))
for( i in 1:length(red)) {
redMod <- with(imput,
glm(formula(paste("(hyp==2) ~ ", red[[i]])), family = binomial))
fullMod <- with(imput,
glm(formula(paste("(hyp==2) ~ ", full[[i]])), family = binomial))
out[[i]] <- list(predictors = diffs[[i]],
pval = c(pool.compare(fullMod, redMod)$pvalue))
}
do.call(rbind.data.frame, out)
# predictors pval
#2 chl 0.9976629
#21 bmi 0.9985028
#3 age 0.9815831
# Check manually by leaving out chl
mod1 <- with(imput, glm((hyp==2) ~ age + bmi + chl , family = binomial))
mod2 <- with(imput, glm((hyp==2) ~ age + bmi , family = binomial))
pool.compare(mod1, mod2)$pvalue
# [,1]
#[1,] 0.9976629
You will get a lot of warnings using this dataset 使用此数据集,您将收到很多警告
EDIT 编辑
You could wrap this in a function 您可以将其包装在函数中
impGlmDrop1 <- function(vars, outcome, Data=imput, Family="binomial")
{
red <- combn(vars, length(vars)-1 , simplify=FALSE)
diffs <- lapply(red, function(i) setdiff(vars, i))
full <- lapply(1:length(red), function(i)
paste(c(red[[i]], diffs[[i]]), collapse=" + "))
red <- combn(vars, length(vars)-1 , FUN=paste, collapse=" + ")
out <- vector("list", length(red))
for( i in 1:length(red)) {
redMod <- with(Data,
glm(formula(paste(outcome, red[[i]], sep="~")), family = Family))
fullMod <- with(Data,
glm(formula(paste(outcome, full[[i]], sep="~")), family = Family))
out[[i]] <- list(predictors = diffs[[i]],
pval = c(pool.compare(fullMod, redMod)$pvalue) )
}
do.call(rbind.data.frame, out)
}
# Run
impGlmDrop1(c("age", "bmi", "chl"), "(hyp==2)")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.