简体   繁体   English

mahout中的SVD输出解释

[英]SVD output interpretation in mahout

I am trying to run a SVD job in mahout. 我正在尝试在mahout中运行SVD作业。 I have a matrix (say A) created (Document x term) of size 372053 x 21338 (21338 no of unique words say N, 372053 documents say M). 我有一个尺寸为372053 x 21338的矩阵(例如A)(文档x术语)(其中21338个不重复的单词说N,372053个文档说M)。 So my matrix A is of size (M*N). 因此我的矩阵A的大小为(M * N)。 I ran the svd using mahout and i got the cleaned eigen vectors (i gave the expected rank as 200 say R). 我使用mahout运行了svd,并得到了经过清理的特征向量(我给出的期望等级为200 say R)。 Now i have a eigen vectors matrix created of size R*N. 现在,我创建了一个特征向量矩阵,大小为R * N。

Stating the SVD equation 陈述SVD方程

A = U * S * V' (V' being transpose of V) A = U * S * V'(V'是V的转置)

I need to convert the matrix A to the new space, to get the compressed vectors of the documents (I am trying to implement LSI) 我需要将矩阵A转换为新的空间,以获得文档的压缩向量(我正在尝试实现LSI)

What is the output i get from mahout SVD? mahout SVD的输出是什么? (I would like to know in terms of the equation above) I read mailing list that we can get the eigen values from the NamedVectors in the generated eigen vectors matrix. (我想根据上面的方程式知道)我阅读了邮件列表 ,我们可以从生成的特征向量矩阵中的NamedVectors获得特征值。

Please guide me on how to proceed from here to generate the document-term matrix A in the new space (of size M*R). 请指导我如何从这里开始在新空间(大小为M * R)中生成文档项矩阵A。

Any help is highly appreciated :) 任何帮助都非常感激:)

A good starting point for LSI with Stochastic SVD on Mahout can be found here . 对于LSI与随机SVD上Mahout的一个很好的起点,可以发现在这里 The good part is that the paper describes also the folding in process and is explicit on the output format in terms of the svd equation. 好的方面是,本文还描述了折叠过程,并根据svd方程明确显示了输出格式。

The work is integrated in the latest version 0.8 and can be used with SSVDCli job or through mahout CLI with mahout ssvd <options> 该作品已集成到最新版本0.8中,可以与SSVDCli作业一起使用,也可以与mahout ssvd <options>一起通过mahout CLI使用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM