简体   繁体   中英

Impute values of a vector using Cosine similarity in Python

The Scenario

I have a Dataset whose last column has NaN values in it, which need to be imputed using only Vector Cosine & Pearson Correlation; after which the data will be further taken for Clustering.

The Problem

It is mandatory for my case to use VECTOR COSINE and PEARSON CORELATION .

Here's a chunk of how my dataset is post_df1 which is taken from csv using pandas

       uid     iid       rat
1    303.0   785.0  3.000000
2    291.0  1042.0  4.000000
3    234.0  1184.0  2.000000
4    102.0   768.0  2.000000
254  944.0   170.0  5.000000
255  944.0   171.0  5.000000
256  944.0   172.0       NaN
257  944.0   173.0       NaN
258  944.0   174.0       NaN

Which is now taken into a Vector (Just to make it easy, suggestions required) using this command

vect_1 = post_df1.iloc[:, 2].values

Yet with sklearn.preprocessing 's Class called Imputer are having Mean, Median & Most frequent methods available, but won't work according to my Scenario.

Questions

  1. Is there any other Package than SurPRISE (by Nicholas Hug), for Vector Cosine & Pearson mehtod
  2. Is it possible to pass a function / method in sklearn for cosine & pearson?
  3. Any other method / way out?

Cosine silirality and Pearson correlation are only parameters in the imputation method, not imputation method. There are various methods of imputation, such as KNN, MICE, SVD and Matrix Factorization. For example, it is possible to use cosine silirality as a parameter of one KNN of the imputation method, but its implementation itself could not be found. fancyimpute package may be helpful as a package with a near implementation. The following is the link. GitHub - hammerlab / fancyimpute: Multivariate imputation and matrix completion algorithms implemented in Python https://github.com/hammerlab/fancyimpute/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM