简体   繁体   中英

Calculating cosine similarity of columns of a python matrix

I have a numpy matrix say A as below

array([[1, 2, 3],
       [1, 2, 2]])

I want to find the cosine similarity matrix of this a matrix where cosine similarity is between the columns.

Now cosine similarity of two vectors is just a dot product of two normalized by the L2 norm product of each

But I don't want to iterate for each column in a loop and do it.

So I first tried this:

from scipy.spatial import distance 
cos=distance.cdist(a.T,a.T,'cosine')

Here I am taking transpose as else it would do cosine of rows(observations). I want for columns.

However I am not sure this is the right answer. The doc of this function says it gives 1- cosine_similarity. So should I then do?

cos-1-distance.cdist(a.T,a.T,'cosine') 

Please advise.

II)

Also what If I try doing something like this:

cos=(np.dot(a.T,a))/(np.linalg.norm(a, axis=0, keepdims=True))*(np.linalg.norm(a, axis=0, keepdims=True))

It won't work as some problem in getting the right L2 norm of the right column. Any idea how we can implement this without function?

Try this:

a = np.array([[1, 2, 3], [1, 2, 2]])
n = np.linalg.norm(a, axis=0).reshape(1, a.shape[1])
a.T.dot(a) / n.T.dot(n)

array([[ 1.        ,  1.        ,  0.98058068],
       [ 1.        ,  1.        ,  0.98058068],
       [ 0.98058068,  0.98058068,  1.        ]])

This assignment for n would have also worked.

np.linalg.norm(a, axis=0)[None, :]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM