简体   繁体   English

如何在 R 的 foreach 循环中使用多重插补数据集(中间)object?

[英]How to use a multiply imputed data set (mids) object in a foreach loop in R?

I am trying to use parallel computation to compute percentile bootstrap 95% confidence intervals for least absolute deviations regression parameters, as explained in this article .本文所述,我正在尝试使用并行计算来计算最小绝对偏差回归参数的百分位引导 95% 置信区间。 However, I am not using a single data frame, but rather a multiply imputed data set ( mids ) object, obtained with the mice package for multiple imputation.但是,我没有使用单个数据框,而是使用mice package 获得的多重插补数据集( mids )object,用于多重插补。 This is where the problem lies.这就是问题所在。

I would like to use the mids (or a list of multiply imputed data sets) object in a foreach loop, perform the bootstrapping, and pool the results.我想在 foreach 循环中使用mids值(或多重插补数据集列表)object,执行引导,并汇集结果。 I managed to get results based on just one single data set by converting the mids object into a list and then use one single element of that list.通过将mids object 转换为一个列表,然后使用该列表中的一个元素,我设法仅基于一个单一数据集获得了结果。 Nonetheless, I would like to use all data sets at once.尽管如此,我想一次使用所有数据集。

A reproducible example:一个可重现的例子:

library(foreach)
library(doParallel)
cores_2_use <- detectCores() - 1

cl <- makeCluster(cores_2_use)
clusterSetRNGStream(cl, 9956)
registerDoParallel(cl)

library(mice)
imp_merged <-
  foreach(no = 1:cores_2_use, 
          .combine = ibind, 
          .export = "nhanes",
          .packages = "mice") %dopar%
  {
    mice(nhanes, m = 30, printFlag = FALSE)
  }
stopCluster(cl)

And here what I have tried:在这里我尝试过:

library(quantreg)
library(mitml)
library(miceadds)
library(splines)

cl <- makeCluster(cores_2_use)
clusterSetRNGStream(cl, 9956)
registerDoParallel(cl)

boot.1 <- foreach(i = 1:100,
                  .combine = rbind,
                  .packages = c('quantreg', 'mice', 'mitml', 'splines')) %dopar% {
                    
                    longlist <- miceadds::mids2datlist(imp_merged)
                    boot_dat <- longlist[[6]][sample(1:nrow(longlist[[6]]), replace = TRUE), ]
                    ## This is now based only on the 6th element of longlist
                    ## I would like to use the whole mids/longlist object (330 data sets on my PC)
                    
                    fit1 <- rq(chl ~ ns(bmi, df = 2, B = c(21, 33)) +
                                 hyp + age, tau = 0.5,
                               data = boot_dat)
                    fit1$coef
                  }
stopCluster(cl)

boot.1.df <- as.data.frame(boot.1)
boot.1.pooled <- do.call(cbind, boot.1.df)
boot.1.ci <- apply(boot.1.pooled, 2, quantile, probs = c(0.025, 0.975))
t(boot.1.ci)

I converted the mids object into a list of multiply imputed data sets with longlist <- miceadds::mids2datlist(imp_merged) and performed the sampling based on one single element (ie, imputed data set) of that list through boot_dat <- longlist[[6]][sample(1:nrow(longlist[[6]]), replace = TRUE), ] .我将mids object 转换为具有longlist <- miceadds::mids2datlist(imp_merged)的多重插补数据集列表,并通过boot_dat <- longlist[[6]][sample(1:nrow(longlist[[6]]), replace = TRUE), ] I would like to use the whole mids object or all elements of longlist .我想使用整个mids object 或longlist的所有元素。

Any help will be much appreciated!任何帮助都感激不尽!

One possible way is to simply combine the datasets into one big data set, and to sample from it directly.一种可能的方法是将数据集简单地组合成一个大数据集,并直接从中采样。

longlist_ = longlist[[1]]
for (j in 2:length(longlist))
  {
    longlist_ = rbind(longlist_,longlist[[i]])
  }
boot_dat <- longlist_[sample(1:nrow(longlist[[6]]), replace = TRUE), ]

Another way is to randomly choose a data set, and random choose a row, and repeat for several times.另一种方法是随机选择一个数据集,随机选择一行,重复几次。

boot_dat = NULL
for (j in seq(nrow(longlist[[6]])))
  {
    boot_dat = rbind(boot_dat, 
               longlist[[sample(length(longlist),1)]][sample(nrow(longlist[[1]]),1),])
  }

Note that to avoid the error of Singular design matrix in rq , a small noise could be added.请注意,为避免rq中的奇异设计矩阵的误差,可以添加一个小噪声。

boot_dat[,'hyp'] = boot_dat[,'hyp'] + runif(nrow(boot_dat), -1e-10, 1e-10)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM