[英]R MICE imputation failing
I am really baffled about why my imputation is failing in R's mice
package.我真的很困惑为什么我的插补在 R 的
mice
包中失败。 I am attempting a very simple operation with the following data frame:我正在尝试使用以下数据框进行非常简单的操作:
dfn <- read.table(text =
"a b c d
0 1 0 1
1 0 0 0
0 0 0 0
NA 0 0 0
0 0 0 NA", header = TRUE)
I then use mice in the following way to perform a simple mean imputation:然后我按以下方式使用鼠标执行简单的均值插补:
imp <- mice(dfn, method = "mean", m = 1, maxit =1)
filled <- complete(imp)
However, my completed data looks like this:但是,我完成的数据如下所示:
filled
# a b c d
#1 0.00 1 0 1
#2 1.00 0 0 0
#3 0.00 0 0 0
#4 0.25 0 0 0
#5 0.00 0 0 NA
Why am I still getting this trailing NA
?为什么我仍然得到这个尾随
NA
? This is the simplest failing example I could construct, but my real data set is much larger and I am just trying to get a sense of where things are going wrong.这是我可以构建的最简单的失败示例,但我的真实数据集要大得多,我只是想了解哪里出错了。 Any help would be greatly appreciated!
任何帮助将不胜感激!
I'm not really sure how accurate this is, but here is an attempt.我不确定这有多准确,但这是一个尝试。 Even though
method="mean"
is supposed to impute the unconditional mean, it appears from the documentation that the prdictorMatrix
is not being changed accordingly.尽管
method="mean"
应该prdictorMatrix
无条件均值,但从文档中可以prdictorMatrix
没有相应地更改。
Normally, leftover NA
occur because the predictors suffer from multicollinearity or because there are too few cases per variable (such that the imputation model cannot be estimated).通常,剩余
NA
出现是因为预测变量存在多重共线性或每个变量的案例太少(因此无法估计插补模型)。 However, method="mean"
shouldn't behave that way.然而,
method="mean"
不应该那样做。
Here is what I did:这是我所做的:
dfn <- read.table(text="a b c d
0 1 0 1
1 0 0 0
0 0 0 0
NA 0 0 0
0 0 0 NA", header=TRUE)
imp <- mice( dfn, method="mean", predictorMatrix=diag(ncol(dfn)) )
complete(imp)
# 1 0.00 1 0 1.00
# 2 1.00 0 0 0.00
# 3 0.00 0 0 0.00
# 4 0.25 0 0 0.00
# 5 0.00 0 0 0.25
You can try this using your actual data set, but you should check the results carefully.您可以使用您的实际数据集进行尝试,但您应该仔细检查结果。 For example, do:
例如,执行:
sapply(dfn, function(x) mean(x,na.rm=TRUE))
The means for each variable should be identical to those that have been imputed.每个变量的均值应与已插补的相同。 Please let me know if this solves your problem.
如果这能解决您的问题,请告诉我。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.