[英]Perform operation on each imputed dataset in R's MICE
How can I perform an operation (like subsetting or adding a calculated column) on each imputed dataset in an object of class mids
from R's package mice
?我怎样才能执行操作(如子集划分或添加计算的列)的类的对象上的每个数据集估算
mids
来自R的包mice
? I would like the result to still be a mids
object.我希望结果仍然是一个
mids
对象。
Edit: Example编辑:示例
library(mice)
data(nhanes)
# create imputed datasets
imput = mice(nhanes)
The imputed datasets are stored as a list of lists估算的数据集存储为列表列表
imput$imp
where there are rows only for the observations with imputation for the given variable.其中只有对给定变量进行插补的观察行。
The original (incomplete) dataset is stored here:原始(不完整)数据集存储在这里:
imput$data
For example, how would I create a new variable calculated as chl/2
in each of the imputed datasets, yielding a new mids
object?例如,我将如何在每个估算数据集中创建一个计算为
chl/2
的新变量,从而产生一个新的mids
对象?
This can be done easily as follows -这可以很容易地完成如下 -
Use complete()
to convert a mids object to a long-format data.frame:使用
complete()
将 mids 对象转换为长格式 data.frame:
long1 <- complete(midsobj1, action='long', include=TRUE)
Perform whatever manipulations needed:执行任何需要的操作:
long1$new.var <- long1$chl/2
long2 <- subset(long1, age >= 5)
use as.mids()
to convert back manipulated data to mids object:使用
as.mids()
将操纵的数据转换回 mids 对象:
midsobj2 <- as.mids(long2)
Now you can use midsobj2
as required.现在您可以根据需要使用
midsobj2
。 Note that the include=TRUE
(used to include the original data with missing values) is needed for as.mids()
to compress the long-formatted data properly.请注意,
as.mids()
需要include=TRUE
(用于包含具有缺失值的原始数据as.mids()
才能正确压缩长格式数据。 Note that prior to mice v2.25 there was a bug in the as.mids() function (see this post https://stats.stackexchange.com/a/158327/69413 )请注意,在 mouse v2.25 之前, as.mids() 函数中存在一个错误(请参阅此帖子https://stats.stackexchange.com/a/158327/69413 )
EDIT: According to this answer https://stackoverflow.com/a/34859264/4269699 (from what is essentially a duplicate question) you can also edit the mids object directly by accessing $data and $imp.编辑:根据这个答案https://stackoverflow.com/a/34859264/4269699 (从本质上是一个重复的问题),您还可以通过访问 $data 和 $imp 直接编辑 mids 对象。 So for example
所以例如
midsobj2<-midsobj1
midsobj2$data$new.var <- midsobj2$data$chl/2
midsobj2$imp$new.var <- midsobj2$imp$chl/2
You will run into trouble though if you want to subset $imp or if you want to use $call, so I wouldn't recommend this solution in general.但是,如果您想对 $imp 进行子集化或使用 $call,您会遇到麻烦,因此我一般不推荐此解决方案。
Another option is to calculate the variables before the imputation and place restrictions on them.另一种选择是在插补之前计算变量并对它们施加限制。
library(mice)
# Create the additional variable - this will have missing
nhanes$extra <- nhanes$chl / 2
# Change the method of imputation for extra, so that it always equals chl/2
# Change the predictor matrix so only chl predicts extra
ini <- mice(nhanes, max = 0, print = FALSE)
meth <- ini$meth
meth["extra"] <- "~I(chl / 2)"
pred <- ini$pred # extra isn't used to predict
pred["extra", "chl"] <- 1
# Imputations
imput <- mice(nhanes, seed = 1, pred = pred, meth = meth, print = FALSE)
There are examples in mice: Multivariate Imputation by Chained Equations in R .在小鼠中有例子:Multivariate Imputation by Chained Equations in R 。
There is an overload of with
that can help you here有大量的
with
可以帮助你
with(imput, chl/2)
the documentation is given at ?with.mids
文档位于
?with.mids
There's a function for this in the basecamb
package: basecamb
包中有一个函数:
library(basecamb)
apply_function_to_imputed_data(mids_object, function)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.