I am working on latent semantic analysis, i am trying to get similarity from 2 documents. I run my code of latent semantic analysis on Python and when i run it i get :
Here are the singular values
[ 0.7376057 0.4596623 0.25422212]
Here are the first 3 columns of the U matrix
[[ 0.98465137 -0.172792 -0.02458864]
[ 0.15675976 0.81362269 0.55986114]
[ 0.07673365 0.55512255 -0.82822153]]
Here are the first 3 rows of the Vt matrix
[[ 0.08861949 0.02992777 0.36751379 0.9253024 ]
[ 0.78716383 0.34742637 0.43792207 -0.26056147]
[ 0.29462756 -0.93722956 0.17407106 -0.06704194]]
How i will find similarity from this numbers ?
https://en.wikipedia.org/wiki/Latent_semantic_analysis explains LSI very well, also your problem.
say, you want to determine the similarity between document i and j. take the i-th column of V^t (=d_i) and j-th column of V^t (=d_j)
take the cosine similarity of diag(S)*d_i and diag(S) * d_j
the closer this is to +1, the more they are similar
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.