简体   繁体   English

合并估算和非估算数据

[英]Combine imputed and non imputed data

I have a question about merging datasets after multiple imputation. 我有一个关于多个插补后合并数据集的问题。 I have created an example to explain my problem: 我创建了一个示例来解释我的问题:

id <- c(1,2,3,4,5,6,7,8,9,10)
age <- c(60,NA,90,55,60,61,77,67,88,90)
bmi <- c(30,NA,NA,23,24,NA,27,23,26,21)
time <- c(62,88,85,NA,68,62,89,62,70,99)
dat <- data.frame(id, age, bmi, time)
dat

id <- c(1,2,3,4,5,6,7,8,9,10)
m1 <- c(60,78,90,55,60,61,77,67,88,90)
m2 <- c(30,44,35,23,24,22,27,23,26,21)
m3 <- c(62,88,85,78,68,62,89,62,70,99)
dat2 <- data.frame(id, m1, m2, m3)
dat2

I have two datasets, dat and dat2. 我有两个数据集,dat和dat2。 The dataset dat contains missing variables, so I use multiple imputation to impute this dataset (package MICE): 数据集dat包含缺少的变量,因此我使用多重插补来插补此数据集(包MICE):

library(mice)
impdat <- mice(dat, maxit = 0)
methdat <- impdat$method
preddat <- impdat$predictorMatrix
preddat["id",] <- 0
preddat[,"id"] <- 0
impdat <- mice(dat, method = methdat, predictorMatrix = preddat, seed =         
2018, maxit = 10, m = 5)

Now I want to merge the imputed dataset impdat with the dataset dat2. 现在,我想将估算的数据集impdat与数据集dat2合并。 But that is were my problem arises. 但这就是我的问题出现了。 I tried the following: 我尝试了以下方法:

completedat <- complete(impdat, include = T, action = 'long')
finaldat <- merge(completedat, dat2, by = "id")

finaldat <- as.mids(finaldat)
  Error in `[<-.data.frame`(`*tmp*`, j, value = c(61, 88)) : replacement has 2 rows, data has 1  

However, this gives me an error message. 但是,这给我一个错误消息。 The merging is successful, because the dataframe completedat is what I want. 合并成功,因为我想要的是完成的数据框。 The problem is that I cannot transform it back to a mids object. 问题是我无法将其转换回mids对象。

I know I can add the variables from dat2 one by one. 我知道我可以一一添加dat2中的变量。 That does work: 确实有效:

completedat <- complete(impdat, include = T, action = 'long')
completedat$m1 <- dat2$m1
finaldat2 <- as.mids(completedat)

In this example, this is okay, because dat2 only has 4 variables. 在此示例中,这没关系,因为dat2仅具有4个变量。 In my real data, I have approximately 200 variables that I want to add to my multiple imputed dataset, so I hope there is an easier way to add all those variables to my imputed dataset. 在我的真实数据中,我大约有200个要添加到多个估算数据集中的变量,因此我希望有一种更简便的方法将所有这些变量添加到估算数据集中。 Can somebody help me? 有人可以帮我吗?

Wouldn't cbind work provided that you want to combine imputed and non-imputed data? 如果您想合并估算数据和非估算数据,是否可以将工作cbind

id <- c(1,2,3,4,5,6,7,8,9,10)
age <- c(60,NA,90,55,60,61,77,67,88,90)
bmi <- c(30,NA,NA,23,24,NA,27,23,26,21)
time <- c(62,88,85,NA,68,62,89,62,70,99)
dat <- data.frame(id, age, bmi, time)
dat

id <- c(1,2,3,4,5,6,7,8,9,10)
m1 <- c(60,78,90,55,60,61,77,67,88,90)
m2 <- c(30,44,35,23,24,22,27,23,26,21)
m3 <- c(62,88,85,78,68,62,89,62,70,99)
dat2 <- data.frame(id, m1, m2, m3)
dat2

# install.packages("mice")
library(mice)
impdat <- mice(dat, 
               seed = 2018, 
               maxit = 10, 
               m = 5)
impdat
# Class: mids
# Number of multiple imputations:  5 
# Imputation methods:
#   id   age   bmi  time 
# "" "pmm" "pmm" "pmm" 
# PredictorMatrix:
#   id age bmi time
# id    0   1   1    1
# age   1   0   1    1
# bmi   1   1   0    1
# time  1   1   1    0

impdat = complete(impdat)
impdat

# id age bmi time
# 1   1  60  30   62
# 2   2  60  24   88
# 3   3  90  24   85
# 4   4  55  23   89
# 5   5  60  24   68
# 6   6  61  24   62
# 7   7  77  27   89
# 8   8  67  23   62
# 9   9  88  26   70
# 10 10  90  21   99

final_data = cbind(impdat, dat2)
final_data
# id age bmi time id m1 m2 m3
# 1   1  60  30   62  1 60 30 62
# 2   2  60  24   88  2 78 44 88
# 3   3  90  24   85  3 90 35 85
# 4   4  55  23   89  4 55 23 78
# 5   5  60  24   68  5 60 24 68
# 6   6  61  24   62  6 61 22 62
# 7   7  77  27   89  7 77 27 89
# 8   8  67  23   62  8 67 23 62
# 9   9  88  26   70  9 88 26 70
# 10 10  90  21   99 10 90 21 99

在此处输入图片说明

I experienced this same issue. 我遇到了同样的问题。 In my case, I had a different number of observations between my imputed and non-imputed data sets. 就我而言,在我估算和未估算的数据集之间有不同数量的观察结果。 To fix this, after I merged the data, I then re-coded the variable .id . 为了解决这个问题,合并数据后,我重新编码了变量.id The mice package outputs .id when you call mice and complete(..., action = 'long') . mice包输出.id当你打电话micecomplete(..., action = 'long') This is different from your data frame variable id but they should correspond to each other by the following code. 这与数据框变量id不同,但它们应通过以下代码相互对应。

library(dplyr)
# recode .id based on value of id
mydata <- mutate(mydata, .id = as.numeric(as.factor(id)))
# this step is important according to the mice manual
mydata <- mydata[order(mydata$.imp, mydata$.id),]

The as.mids function worked for me when I applied this recode and I hope it works for you, too. 当我应用此重新编码时, as.mids函数对我as.mids ,我希望它也对您as.mids

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM