简体   繁体   中英

Effective way to compute cosine similarity for sparse tensors in python?

I have a list of unit tensors(length = 1) . This list contains ~20 000 such tensors. Tensors have ~3 000 dimensions but are very sparse. Only x (0 < x < 1) dimensions are not 0 . And I need to compute cosine similarity between all these tensors. What is the most effective way to do this? (This is not an NLP task, but my solution looks similar to word2Vect approach, that's why I have added NLP tag. My tensor has more dimensions than word2vec and it is more sparse)

Refer below site for sklearn cosine_similarity function

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html

In python

from sklearn.metrics.pairwise import cosine_similarity
cos_sim = cosine_similarity(vector1,vector2)

SciKit-Learn's cosine_similarity is your friend:

from scipy import sparse
from sklearn.metrics.pairwise import cosine_similarity

# example test:
T = sparse.rand(4, 3, 0.9)
cosine_similarity(T)

# full run (tensor as described in question):
T = sparse.rand(20000, 3000)
%time cosine_similarity(T)

Takes about 4.4 seconds on my machine.

# staying sparse:
%time cosine_similarity(T, dense_output=False)

Takes less than 2 seconds on my machine (ie, around a factor 2 speedup).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM