简体   繁体   中英

How to get similarity from LSA

I am working on latent semantic analysis, i am trying to get similarity from 2 documents. I run my code of latent semantic analysis on Python and when i run it i get :

Here are the singular values
[ 0.7376057   0.4596623   0.25422212]
Here are the first 3 columns of the U matrix
[[ 0.98465137 -0.172792   -0.02458864]
[ 0.15675976  0.81362269  0.55986114]
[ 0.07673365  0.55512255 -0.82822153]]
Here are the first 3 rows of the Vt matrix
[[ 0.08861949  0.02992777  0.36751379  0.9253024 ]
[ 0.78716383  0.34742637  0.43792207 -0.26056147]
[ 0.29462756 -0.93722956  0.17407106 -0.06704194]]

How i will find similarity from this numbers ?

https://en.wikipedia.org/wiki/Latent_semantic_analysis explains LSI very well, also your problem.

say, you want to determine the similarity between document i and j. take the i-th column of V^t (=d_i) and j-th column of V^t (=d_j)

take the cosine similarity of diag(S)*d_i and diag(S) * d_j

the closer this is to +1, the more they are similar

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM