简体   繁体   中英

Latent semantic analysis (LSA) single value decomposition (SVD) understanding

Bear with me through my modest understanding of LSI (Mechanical Engineering background):

After performing SVD in LSI, you have 3 matrices:

U, S, and V transpose.

U compares words with topics and S is a sort of measure of strength of each feature. Vt compares topics with documents.

 U dot S dot Vt

returns the original matrix before SVD. Without doing too much (none) in-depth algebra it seems that:

 U dot S dot **Ut**

returns a term by term matrix, which provides a comparison between the terms. ie how related one term is to other terms, a DSM (design structure matrix) of sorts that compares words instead of components. I could be completely wrong, but I tried it on a sample data set, and the results seemed to make sense. It could just be bias though (I wanted it to work, so I saw what I wanted). I can't post the results as the documents are protected.

My question though is: Does this make any sense? Logically? Mathematically?

Thanks for any time/responses.

If you want to know how related one term is to another you can just compute

(U dot S)

The terms are represented by the row vectors. You can then compute the distance matrix by applying a distance function such as euclidean distance. Once you make the distance matrix by computing the distance between all the vectors the resulted matrix should be hollow symmetric with all distances >0. if the distance A[i,j] is small then they are related otherwise they are not.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM