简体   繁体   中英

SVD of large matrix in python on aws using spark

I am migrating an application to AWS that requires the calculation of a large symmetric, positive-valued matrix, followed by an SVD/eigendecomposition to recover some key eigenvectors and eigenvalues. The edge size of the matrix could be 100K or more, so I am looking for a distributed operator in Spark to perform the SVD faster than a straight scipy/numpy svd operator. I am not making the assumption of sparsity. Can someone advise on how to perform the SVD using Spark?

Spark version 2.2.0 has a python api for singular value decomposition.

# convert your rdd into RowMatrix
rm = RowMatrix(data_rdd)
# if number of components you wish to retain is p then
svd = rm.computeSVD(p, True)
U = svd.U
S = svd.S
V = svd.V

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM