简体   繁体   中英

How is the similarity between sentences calculated with LSA?

I have understood how LSA works when the similarity between words is calculated. I am using LSA from the website lsa.colorado.edu, but I cannot find a source how the similarity between sentences or multiple words is calculated. Is it just done by averaging over all pairwise similarities?

You can combine word vectors simply by summing them together and returning the final summation as sentence vector. Since these representations have the same type as word representations, you can easily use existing methods for computing the Semantic Similarity.

Then to compute semantic similarity you can use the cosine value between those vectors.

I'm currently using the S-Space library and It has a DocumentVectorBuilder class that perform this task.

You use what is called Dot product to calculate the cosine similarity between two vectors. So, once you get the SVD matrix from your term-document frequency matrix, you then apply dot product formula between two vectors.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM