简体   繁体   English

R MICE 插补失败

[英]R MICE imputation failing

I am really baffled about why my imputation is failing in R's mice package.我真的很困惑为什么我的插补在 R 的mice包中失败。 I am attempting a very simple operation with the following data frame:我正在尝试使用以下数据框进行非常简单的操作:

dfn <- read.table(text =
"a b c  d
 0 1 0  1
 1 0 0  0
 0 0 0  0
NA 0 0  0
 0 0 0 NA", header = TRUE)

I then use mice in the following way to perform a simple mean imputation:然后我按以下方式使用鼠标执行简单的均值插补:

imp <- mice(dfn, method = "mean", m = 1, maxit =1)
filled <- complete(imp)

However, my completed data looks like this:但是,我完成的数据如下所示:

filled
#     a b c  d
#1 0.00 1 0  1
#2 1.00 0 0  0
#3 0.00 0 0  0
#4 0.25 0 0  0
#5 0.00 0 0 NA

Why am I still getting this trailing NA ?为什么我仍然得到这个尾随NA This is the simplest failing example I could construct, but my real data set is much larger and I am just trying to get a sense of where things are going wrong.这是我可以构建的最简单的失败示例,但我的真实数据集要大得多,我只是想了解哪里出错了。 Any help would be greatly appreciated!任何帮助将不胜感激!

I'm not really sure how accurate this is, but here is an attempt.我不确定这有多准确,但这是一个尝试。 Even though method="mean" is supposed to impute the unconditional mean, it appears from the documentation that the prdictorMatrix is not being changed accordingly.尽管method="mean"应该prdictorMatrix无条件均值,但从文档中可以prdictorMatrix没有相应地更改。

Normally, leftover NA occur because the predictors suffer from multicollinearity or because there are too few cases per variable (such that the imputation model cannot be estimated).通常,剩余NA出现是因为预测变量存在多重共线性或每个变量的案例太少(因此无法估计插补模型)。 However, method="mean" shouldn't behave that way.然而, method="mean"不应该那样做。

Here is what I did:这是我所做的:

dfn <- read.table(text="a b c  d
 0 1 0  1
 1 0 0  0
 0 0 0  0
NA 0 0  0
 0 0 0 NA", header=TRUE)

imp <- mice( dfn, method="mean", predictorMatrix=diag(ncol(dfn)) )
complete(imp)

# 1 0.00 1 0 1.00
# 2 1.00 0 0 0.00
# 3 0.00 0 0 0.00
# 4 0.25 0 0 0.00
# 5 0.00 0 0 0.25

You can try this using your actual data set, but you should check the results carefully.您可以使用您的实际数据集进行尝试,但您应该仔细检查结果。 For example, do:例如,执行:

sapply(dfn, function(x) mean(x,na.rm=TRUE))

The means for each variable should be identical to those that have been imputed.每个变量的均值应与已插补的相同。 Please let me know if this solves your problem.如果这能解决您的问题,请告诉我。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM