I have a Dataset whose last column has NaN values in it, which need to be imputed using only Vector Cosine & Pearson Correlation; after which the data will be further taken for Clustering.
It is mandatory for my case to use VECTOR COSINE and PEARSON CORELATION .
Here's a chunk of how my dataset is post_df1 which is taken from csv using pandas
uid iid rat
1 303.0 785.0 3.000000
2 291.0 1042.0 4.000000
3 234.0 1184.0 2.000000
4 102.0 768.0 2.000000
254 944.0 170.0 5.000000
255 944.0 171.0 5.000000
256 944.0 172.0 NaN
257 944.0 173.0 NaN
258 944.0 174.0 NaN
Which is now taken into a Vector (Just to make it easy, suggestions required) using this command
vect_1 = post_df1.iloc[:, 2].values
Yet with sklearn.preprocessing
's Class called Imputer
are having Mean, Median & Most frequent
methods available, but won't work according to my Scenario.
Cosine silirality and Pearson correlation are only parameters in the imputation method, not imputation method. There are various methods of imputation, such as KNN, MICE, SVD and Matrix Factorization. For example, it is possible to use cosine silirality as a parameter of one KNN of the imputation method, but its implementation itself could not be found. fancyimpute
package may be helpful as a package with a near implementation. The following is the link. GitHub - hammerlab / fancyimpute: Multivariate imputation and matrix completion algorithms implemented in Python https://github.com/hammerlab/fancyimpute/
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.