简体繁体 English

有没有人尝试在“小鼠”包中并行化多重插补？

[英]Has anyone tried to parallelize multiple imputation in 'mice' package?

原文 2014-09-29 08:04:19 0 1 r/ parallel-processing/ r-mice

I'm aware of the fact that Amelia R package provides some support for parallel multiple imputation (MI) .我知道Amelia R包为并行多重插补 (MI)提供了一些支持。 However, preliminary analysis of my study's data revealed that the data is not multivariate normal , so, unfortunately, I can't use Amelia .但是，对我的研究数据的初步分析显示数据不是多元正态的，因此，不幸的是，我不能使用Amelia 。 Consequently, I've switched to using mice R package for MI, as this package can perform MI on data that is not multivariate normal.因此，我改用针对 MI 的mice R包，因为该包可以对非多元正态数据执行 MI。

Since the MI process via mice is very slow (currently I'm using AWS m3.large 2-core instance), I've started wondering whether it's possible to parallelize the procedure to save processing time.由于通过mice的 MI 过程非常缓慢（目前我使用的是 AWS m3.large 2 m3.large例），我开始怀疑是否可以并行化该过程以节省处理时间。 Based on my review of mice documentation and the corresponding JSS paper, as well as mice 's source code, it appears that currently the package doesn't support parallel operations.根据我的检讨mice文件和相应的JSS文件，以及mice的源代码，看来目前包不支持并行操作。 This is sad, because IMHO the MICE algorithm is naturally parallel and, thus, its parallel implementation should be relatively easy and it would result in a significant economy in both time and resources.这是可悲的，因为恕我直言，MICE 算法自然是并行的，因此，它的并行实现应该相对容易，并且会在时间和资源上产生显着的经济性。

Question: Has anyone tried to parallelize MI in mice package, either externally (via R parallel facilities), or internally (by modifying the source code) and what are results, if any?问题：有没有人尝试在mice包中并行化 MI，无论是在外部（通过R并行工具）还是在内部（通过修改源代码），结果是什么（如果有的话）？ Thank you!谢谢！

1 个解决方案

Recently, I've tried to parallelize multiple imputation (MI) via mice package externally, that is, by using R multiprocessing facilities, in particular parallel package, which comes standard with R base distribution.最近，我尝试通过外部的mice包来并行化多重插补 (MI) ，也就是说，通过使用R多处理设施，特别是R基础发行版标配的parallel包。 Basically, the solution is to use mclapply() function to distribute a pre-calculated share of the total number of needed MI iterations and then combine resulting imputed data into a single object.基本上，解决方案是使用mclapply()函数来分配所需 MI 迭代总数的预先计算份额，然后将结果估算数据合并到单个对象中。 Performance-wise , the results of this approach are beyond my most optimistic expectations: the processing time decreased from 1.5 hours to under 7 minutes (!).在性能方面，这种方法的结果超出了我最乐观的预期：处理时间从 1.5 小时减少到7 分钟以下（！）。 That's only on two cores.那只是在两个内核上。 I've removed one multilevel factor, but it shouldn't have much effect.我已经删除了一个多级因素，但它应该没有太大影响。 Regardless, the result is unbelievable!无论如何，结果令人难以置信！