简体   繁体   English

自动比较小鼠glm.mids的嵌套模型

[英]Automatically compare nested models from mice's glm.mids

I have a multiply-imputed model from R's mice package in which there are lots of factor variables. 我有一个来自R's mice包装的多重插补模型,其中有很多因子变量。 For example: 例如:

library(mice)
library(Hmisc)

# turn all the variables into factors
fake = nhanes
fake$age = as.factor(nhanes$age)
fake$bmi = cut2(nhanes$bmi, g=3) 
fake$chl = cut2(nhanes$chl, g=3) 

head(fake)
  age         bmi hyp       chl
1   1        <NA>  NA      <NA>
2   2 [20.4,25.5)   1 [187,206)
3   1        <NA>   1 [187,206)
4   3        <NA>  NA      <NA>
5   1 [20.4,25.5)   1 [113,187)
6   3        <NA>  NA [113,187)

imput = mice(nhanes)

# big model
fit1 = glm.mids((hyp==2) ~ age + bmi + chl, data=imput, family = binomial)

I want to test the significance of each entire factor variable in the model (not the indicator variables for each level) by testing the full model against each possible nested model that drops one variable at a time. 我想对在某时刻下降一个变量每个可能的嵌套模型中测试完整模型来测试模型(不是每个级别的指标变量)的每个因子变量的意义。 Manually, I can do: 手动,我可以做:

# small model (no chl)
fit2 = glm.mids((hyp==2) ~ age + bmi, data=imput, family = binomial)

# extract p-value from pool.compare
pool.compare(fit1, fit2)$pvalue

How can I do this automatically for all the factor variables in my model? 如何为模型中的所有因子变量自动执行此操作? The very helpful function drop1 was suggested to me for a previous question -- now I want to do something exactly like that except for the mice case. 我对上一个问题建议了非常有用的功能drop1 ,现在我想做的事情完全一样,除了mice案。

Possibly helpful note: An annoying feature of pool.compare is that it appears to want the "extra" variables in the larger model to be placed after the ones that are shared with the smaller model. 可能有用的注释: pool.compare一个令人讨厌的功能是它似乎希望将较大模型中的“额外”变量放置在与较小模型共享的变量之后。

You can use a loop to iterate through the different combinations of predictors, after arranging them in the order required for pool.compare . 在按pool.compare所需的顺序排列预测变量的不同组合之后,可以使用循环来迭代它们。

So using your fake data from above - tweaked the number of categories 因此,从上方使用您的fake数据-调整了类别数

library(mice)
library(Hmisc)
# turn all the variables into factors
# turn all the variables into factors
fake <- nhanes
fake$age <- as.factor(nhanes$age)
fake$bmi <- cut2(nhanes$bmi, g=2) 
fake$chl <- cut2(nhanes$chl, g=2) 

# Impute
imput <- mice(fake, seed=1)

# Create models 
# - reduced models with one variable removed
# - full models with extra variables at end of expression
vars <- c("age", "bmi", "chl")

red <- combn(vars, length(vars)-1 , simplify=FALSE)
diffs <- lapply(red, function(i) setdiff(vars, i) )
(full <- lapply(1:length(red), function(i) 
                            paste(c(red[[i]], diffs[[i]]), collapse=" + ")))
#[[1]]
#[1] "age + bmi + chl"

#[[2]]
#[1] "age + chl + bmi"

#[[3]]
#[1] "bmi + chl + age"

(red <- combn(vars, length(vars)-1 , FUN=paste, collapse=" + "))
#[1] "age + bmi" "age + chl" "bmi + chl"

The models are now in the correct order to pass to the glm call. 现在,这些模型以正确的顺序传递给glm调用。 I've also replaced glm.mids method as it has been replaced by with.mids - see ?glm.mids 我还替换了glm.mids方法,因为它已被替换为with.mids请参阅?glm.mids

out <- vector("list", length(red))

for( i in 1:length(red)) {

  redMod <-  with(imput, 
               glm(formula(paste("(hyp==2) ~ ", red[[i]])), family = binomial))

  fullMod <-  with(imput, 
               glm(formula(paste("(hyp==2) ~ ", full[[i]])), family = binomial))

  out[[i]] <- list(predictors = diffs[[i]], 
                   pval = c(pool.compare(fullMod, redMod)$pvalue))
   }

do.call(rbind.data.frame, out)
#    predictors      pval
#2         chl 0.9976629
#21        bmi 0.9985028
#3         age 0.9815831

# Check manually by leaving out chl
mod1 <- with(imput, glm((hyp==2) ~ age + bmi + chl , family = binomial))
mod2 <- with(imput, glm((hyp==2) ~ age + bmi , family = binomial))
pool.compare(mod1, mod2)$pvalue
#         [,1]
#[1,] 0.9976629

You will get a lot of warnings using this dataset 使用此数据集,您将收到很多警告

EDIT 编辑

You could wrap this in a function 您可以将其包装在函数中

impGlmDrop1 <- function(vars, outcome, Data=imput,  Family="binomial") 
{

  red <- combn(vars, length(vars)-1 , simplify=FALSE)
  diffs <- lapply(red, function(i) setdiff(vars, i))
  full <- lapply(1:length(red), function(i) 
                      paste(c(red[[i]], diffs[[i]]), collapse=" + "))
  red <- combn(vars, length(vars)-1 , FUN=paste, collapse=" + ")

  out <- vector("list", length(red))
  for( i in 1:length(red)) {

  redMod <-  with(Data, 
              glm(formula(paste(outcome, red[[i]], sep="~")), family = Family))
  fullMod <-  with(Data, 
              glm(formula(paste(outcome, full[[i]], sep="~")), family = Family))
  out[[i]] <- list(predictors = diffs[[i]], 
                   pval = c(pool.compare(fullMod, redMod)$pvalue)  )
  }
  do.call(rbind.data.frame, out)
}

# Run
impGlmDrop1(c("age", "bmi", "chl"), "(hyp==2)")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM