简体   繁体   English

潜在语义分析(LSA)单值分解(SVD)理解

[英]Latent semantic analysis (LSA) single value decomposition (SVD) understanding

Bear with me through my modest understanding of LSI (Mechanical Engineering background): 通过对LSI(机械工程背景)的适度了解与我联系:

After performing SVD in LSI, you have 3 matrices: 在LSI中执行SVD之后,您将获得3个矩阵:

U, S, and V transpose. U,S和V转置。

U compares words with topics and S is a sort of measure of strength of each feature. U将单词与主题进行比较,S是每种功能强度的一种度量。 Vt compares topics with documents. Vt将主题与文档进行比较。

 U dot S dot Vt

returns the original matrix before SVD. 返回SVD之前的原始矩阵。 Without doing too much (none) in-depth algebra it seems that: 如果不做太多(没有)深度代数,似乎:

 U dot S dot **Ut**

returns a term by term matrix, which provides a comparison between the terms. 传回字词矩阵,可提供字词之间的比较。 ie how related one term is to other terms, a DSM (design structure matrix) of sorts that compares words instead of components. 也就是说,一个术语与其他术语之间的关系如何,即一种DSM(设计结构矩阵),它比较单词而不是组成部分。 I could be completely wrong, but I tried it on a sample data set, and the results seemed to make sense. 我可能完全错了,但是我在一个样本数据集上进行了尝试,结果似乎很有意义。 It could just be bias though (I wanted it to work, so I saw what I wanted). 不过,这可能只是偏见(我想让它起作用,所以我看到了我想要的)。 I can't post the results as the documents are protected. 由于文件受到保护,我无法发布结果。

My question though is: Does this make any sense? 我的问题是:这有意义吗? Logically? 逻辑上? Mathematically? 数学上?

Thanks for any time/responses. 感谢您的任何时间/回复。

If you want to know how related one term is to another you can just compute 如果您想知道一个术语与另一个术语的相关性,您可以计算

(U dot S) (U点S)

The terms are represented by the row vectors. 术语由行向量表示。 You can then compute the distance matrix by applying a distance function such as euclidean distance. 然后,您可以通过应用距离函数(例如欧几里得距离)来计算距离矩阵。 Once you make the distance matrix by computing the distance between all the vectors the resulted matrix should be hollow symmetric with all distances >0. 一旦通过计算所有向量之间的距离得出距离矩阵,结果矩阵应为空心对称且所有距离> 0。 if the distance A[i,j] is small then they are related otherwise they are not. 如果距离A [i,j]小,则它们是相关的,否则就不相关。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM