[英]Do imputation in R when mice returns error that “system is computationally singular”
I am trying to do imputation to a medium size dataframe (~100,000 rows) where 5 columns out of 30 have NAs (a large proportion, around 60%).我正在尝试对中等大小的数据框(~100,000 行)进行插补,其中 30 列中有 5 列具有 NA(很大一部分,大约 60%)。
I tried mice with the following code:我用以下代码尝试了鼠标:
library(mice)
data_3 = complete(mice(data_2))
After the first iteration I got the following exception:第一次迭代后,我得到以下异常:
iter imp variable
1 1 Existing_EMI Loan_Amount Loan_Period
Error in solve.default(xtx + diag(pen)): system is computationally singular: reciprocal condition number = 1.08007e-16
Is there some other package that is more robust to this kind of situations?是否有其他一些更适合这种情况的软件包? How can I deal with this problem?
我该如何处理这个问题?
Your 5 columns might have a number of unbalanced factors.您的 5 列可能有许多不平衡的因素。 When these are turned into dummy variables there is a high probability that you will have one column a linear combination of another.
当这些变成虚拟变量时,很有可能您将一列与另一列线性组合。 The default imputation methods of
mice
involve linear regression, this results in a X matrix that cannot be inverted and will result in your error. mice
的默认插补方法涉及线性回归,这会导致无法反转的 X 矩阵会导致您的错误。
Change the method being used to something else like cart -- mice(data_2, method = "cart")
--.将正在使用的方法更改为其他内容,例如购物车 --
mice(data_2, method = "cart")
--。 Also check which seed you are calling before / during imputation for reproducible results.还要检查您在插补之前/期间调用的种子以获得可重复的结果。
My advice is to go through the 7 vignettes of mice.我的建议是通过老鼠的 7 个小插曲。 You can find out how to change the
method
of imputation being used for separate columns instead of for the whole dataset.您可以了解如何更改用于单独列而不是整个数据集的插补
method
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.