简体   繁体   中英

SVD output interpretation in mahout

I am trying to run a SVD job in mahout. I have a matrix (say A) created (Document x term) of size 372053 x 21338 (21338 no of unique words say N, 372053 documents say M). So my matrix A is of size (M*N). I ran the svd using mahout and i got the cleaned eigen vectors (i gave the expected rank as 200 say R). Now i have a eigen vectors matrix created of size R*N.

Stating the SVD equation

A = U * S * V' (V' being transpose of V)

I need to convert the matrix A to the new space, to get the compressed vectors of the documents (I am trying to implement LSI)

What is the output i get from mahout SVD? (I would like to know in terms of the equation above) I read mailing list that we can get the eigen values from the NamedVectors in the generated eigen vectors matrix.

Please guide me on how to proceed from here to generate the document-term matrix A in the new space (of size M*R).

Any help is highly appreciated :)

A good starting point for LSI with Stochastic SVD on Mahout can be found here . The good part is that the paper describes also the folding in process and is explicit on the output format in terms of the svd equation.

The work is integrated in the latest version 0.8 and can be used with SSVDCli job or through mahout CLI with mahout ssvd <options>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM