SVD of large matrix in python on aws using spark

Question

I am migrating an application to AWS that requires the calculation of a large symmetric, positive-valued matrix, followed by an SVD/eigendecomposition to recover some key eigenvectors and eigenvalues. The edge size of the matrix could be 100K or more, so I am looking for a distributed operator in Spark to perform the SVD faster than a straight scipy/numpy svd operator. I am not making the assumption of sparsity. Can someone advise on how to perform the SVD using Spark?

Answer 1

Spark version 2.2.0 has a python api for singular value decomposition.

# convert your rdd into RowMatrix
rm = RowMatrix(data_rdd)
# if number of components you wish to retain is p then
svd = rm.computeSVD(p, True)
U = svd.U
S = svd.S
V = svd.V

SVD of large matrix in python on aws using spark

Question

1 answers

solution1
0 2017-11-11 21:28:01

SVD of large matrix in python on aws using spark

Question

1 answers

solution1 0 2017-11-11 21:28:01

solution1
0 2017-11-11 21:28:01