Need to find python function that works like this R func:
proxy::simil(method = "cosine", by_rows = FALSE)
ie finds similarity matrix by pair-wise calculating cosine distance between dataframe rows. If NaNs are present, it should drop exact columns with NaNs in these 2 rows
Simil function description (R)
upd . I have also tried to delete NaNs in every pair of rows in loop using cosine func from scipy.spatial.distance. It gives the same result as in R, but works ages :(
You can try this approach: https://github.com/Midnighter/nadist , alternatively you can use _chk_weights
with nan_screen=True
as described here by metaperture here https://github.com/scipy/scipy/issues/3870 , hope that helps.
I have found that Midnighter had posted the same problem previously on stackoverflow: Compute the pairwise distance in scipy with missing values . There are some other solutions there but, as he moved on to cytonize it I bet they were not the best.
I solved the problem by creating a mask (boolean array indicating which values are missing) and calculating pairwise cosine distances between row-vectors of matrix. As a result I received a long vector of similarities, which I then pivoted to get the similarity matrix
您可以将NaN
与0
交换,然后尝试计算余弦相似度。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.