当鼠标返回“系统在计算上是奇异的”错误时，在 R 中进行插补

Question

I am trying to do imputation to a medium size dataframe (~100,000 rows) where 5 columns out of 30 have NAs (a large proportion, around 60%).我正在尝试对中等大小的数据框（~100,000 行）进行插补，其中 30 列中有 5 列具有 NA（很大一部分，大约 60%）。

I tried mice with the following code:我用以下代码尝试了鼠标：

library(mice)    
data_3 = complete(mice(data_2))

After the first iteration I got the following exception:第一次迭代后，我得到以下异常：

iter imp variable
  1   1  Existing_EMI  Loan_Amount  Loan_Period

Error in solve.default(xtx + diag(pen)): system is computationally singular: reciprocal condition number = 1.08007e-16

Is there some other package that is more robust to this kind of situations?是否有其他一些更适合这种情况的软件包？ How can I deal with this problem?我该如何处理这个问题？

Answer 1

Your 5 columns might have a number of unbalanced factors.您的 5 列可能有许多不平衡的因素。 When these are turned into dummy variables there is a high probability that you will have one column a linear combination of another.当这些变成虚拟变量时，很有可能您将一列与另一列线性组合。 The default imputation methods of mice involve linear regression, this results in a X matrix that cannot be inverted and will result in your error. mice的默认插补方法涉及线性回归，这会导致无法反转的 X 矩阵会导致您的错误。

Change the method being used to something else like cart -- mice(data_2, method = "cart") --.将正在使用的方法更改为其他内容，例如购物车 -- mice(data_2, method = "cart") --。 Also check which seed you are calling before / during imputation for reproducible results.还要检查您在插补之前/期间调用的种子以获得可重复的结果。

My advice is to go through the 7 vignettes of mice.我的建议是通过老鼠的 7 个小插曲。 You can find out how to change the method of imputation being used for separate columns instead of for the whole dataset.您可以了解如何更改用于单独列而不是整个数据集的插补method 。

当鼠标返回“系统在计算上是奇异的”错误时，在 R 中进行插补

问题描述

1 个解决方案

解决方案1
9 2018-01-20 12:43:24

当鼠标返回“系统在计算上是奇异的”错误时，在 R 中进行插补

问题描述

1 个解决方案

解决方案1 9 2018-01-20 12:43:24

解决方案1
9 2018-01-20 12:43:24