简体   繁体   中英

Data Imputation with KNN, SoftImpute

I wanted to run a comparison of imputation values from the fancyimpute package using MICE, KNN, and Soft Impute, however, when I ran my code, the KNN and SoftImpute only imputed 0 for my values compared to the more sensical values imputed by MICE.

imputed_numerical=train[['Age']].select_dtypes(include='number']).as_matrix()

Age_MICE=MICE().complete(imputed_numerical)
Age_KNN=KNN(k=3).complete(imputed_numerical)
Age_SoftImpute=SoftImpute().complete(imputed_numerical)

I put the results in a dataframe which looks like this:

Not_Imputed MICE    KNN SoftImpute
   22.0    [22.0]  [22.0]  [22.0]
   38.0    [38.0]  [38.0]  [38.0]
   26.0    [26.0]  [26.0]  [26.0]
   35.0    [35.0]  [35.0]  [35.0]
   35.0    [35.0]  [35.0]  [35.0]
   NaN     [29]    [0.0]   [0.0]
   54.0    [54.0]  [54.0]  [54.0]
   2.0     [2.0]   [2.0]   [2.0]
   27.0    [27.0]  [27.0]  [27.0]
   14.0    [14.0]  [14.0]  [14.0]
   4.0     [4.0]   [4.0]   [4.0]
   58.0    [58.0]  [58.0]  [58.0]
   20.0    [20.0]  [20.0]  [20.0]
   39.0    [39.0]  [39.0]  [39.0]
   14.0    [14.0]  [14.0]  [14.0]
   55.0    [55.0]  [55.0]  [55.0]
   2.0     [2.0]   [2.0]   [2.0]
   NaN     [27.6]  [0.0]   [0.0]
   31.0    [31.0]  [31.0]  [31.0]
   NaN     [30]    [0.0]   [0.0]

Question: Why are KNN and SoftImpute only imputing 0 as the completed value?

The problem is that these are multivariate procedures, but you are only using one variable (column). MICE performs a multivariate regression, KNN takes the average of N neighbors, which are closest to the missing value in a multidimensional space (each dimension is a variable), and I'm not sure about softImpute but it is likely a multivariate procedure as well.

For example, see this warning message from the knn procedure:

[KNN] Warning: 3/20 still missing after imputation, replacing with 0

or this warning from SoftImpute:

RuntimeWarning: invalid value encountered in double_scalars
  return (np.sqrt(ssd) / old_norm) < self.convergence_threshold

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM