[英]Error in missing value imputation using MICE package
I have a huge data (4M x 17)
that has missing values. 我有一个巨大的数据
(4M x 17)
,缺少值。 Two columns are categorical, rest all are numerical. 两列是分类,其余都是数字。 I want to use MICE package for missing value imputation.
我想使用MICE包来减少价值。 This is what I tried:
这是我试过的:
> testMice <- mice(myData[1:100000,]) # runs fine
> testTot <- predict(testMice, myData)
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "mids"
Running the imputation on whole dataset was computationally expensive, so I ran it on only the first 100K observations. 在整个数据集上运行估算是计算上昂贵的,所以我只在前100K观测值上运行它。 Then I am trying to use the output to impute the whole data.
然后我试图使用输出来估算整个数据。
Is there anything wrong with my approach? 我的方法有什么问题吗? If yes, what should I do to make it correct?
如果是,我该怎么做才能使其正确? If no, then why am I getting this error?
如果不是,那么为什么我会收到此错误?
Neither mice
nor hmisc
provide the parameter estimates from the imputation process. mice
和hmisc
都没有提供来自插补过程的参数估计。 Both Amelia
and imputeMulti
do. Amelia
和imputeMulti
都做到了。 In both cases, you can extract the parameter estimates and use them for imputing your other observations. 在这两种情况下,您都可以提取参数估计值并使用它们来估算其他观测值。
Amelia
assumes your data are distributed as a multivariate normal (eg. X \\sim N(\\mu, \\Sigma). Amelia
假设您的数据以多变量法线分布(例如X \\ sim N(\\ mu,\\ Sigma)。 imputeMulti
assumes that your data is distributed as a multivariate multinomial distribution. imputeMulti
假设您的数据是作为多元多项分布分发的。 That is the complete cell counts are distributed (X \\sim M(n,\\theta)) where n is the number of observations. Fitting can be done as follows, via example data. 可以通过示例数据如下进行拟合。 Examining parameter estimates is shown further below.
检查参数估计值如下所示。
library(Amelia)
library(imputeMulti)
data(tract2221, package= "imputeMulti")
test_dat2 <- tract2221[, c("gender", "marital_status","edu_attain", "emp_status")]
# fitting
IM_EM <- multinomial_impute(test_dat2, "EM",conj_prior = "non.informative", verbose= TRUE)
amelia_EM <- amelia(test_dat2, m= 1, noms= c("gender", "marital_status","edu_attain", "emp_status"))
amelia
function are found in amelia_EM$mu
and amelia_EM$theta
. amelia
函数的参数估计值可在amelia_EM$mu
和amelia_EM$theta
。 imputeMulti
are found in IM_EM@mle_x_y
and can be accessed via the get_parameters
method. imputeMulti
中的参数估计值可在IM_EM@mle_x_y
imputeMulti
中找到, IM_EM@mle_x_y
通过get_parameters
方法访问。 imputeMulti
has noticeably higher imputation accuracy for categorical data relative to either of the other 3 packages, though it only accepts multinomial (eg. factor
) data. imputeMulti
相对于其他3个包中的任何一个具有明显更高的分类数据的插补精度,尽管它只接受多项(例如factor
)数据。
All of this information is in the currently unpublished vignette for imputeMulti
. 所有这些信息都在
imputeMulti
的当前未发布的插图中。 The paper has been submitted to JSS and I am awaiting a response before adding the vignette to the package. 该论文已提交给JSS,我正在等待响应,然后将晕影添加到包中。
You don't use predict()
with mice
. 你没有对
mice
使用predict()
。 It's not a model you're fitting per se. 它本身并不适合您的模型。 Your imputed results are already there for the 100,000 rows.
您的推算结果已经存在100,000行。
If you want data for all rows then you have to put all rows in mice
. 如果您想要所有行的数据,那么您必须将所有行放在
mice
。 I wouldn't recommend it though, unless you set it up on a large cluster with dozens of CPU cores. 我不推荐它,除非你在一个有几十个CPU核心的大型集群上进行设置。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.