使用Spark在AWS上的python中的大型矩阵的SVD

Question

I am migrating an application to AWS that requires the calculation of a large symmetric, positive-valued matrix, followed by an SVD/eigendecomposition to recover some key eigenvectors and eigenvalues. 我正在将一个应用程序迁移到AWS，该应用程序需要计算一个大的对称正值矩阵，然后进行SVD /特征分解以恢复一些关键特征向量和特征值。 The edge size of the matrix could be 100K or more, so I am looking for a distributed operator in Spark to perform the SVD faster than a straight scipy/numpy svd operator. 矩阵的边缘大小可能为100K或更大，因此我正在Spark中寻找分布式运算符来执行SVD运算，而不是直接scipy / numpy svd运算符。 I am not making the assumption of sparsity. 我不是在假设稀疏。 Can someone advise on how to perform the SVD using Spark? 有人可以建议如何使用Spark执行SVD吗？

Answer 1

Spark version 2.2.0 has a python api for singular value decomposition. Spark版本2.2.0具有用于奇异值分解的python API。

# convert your rdd into RowMatrix
rm = RowMatrix(data_rdd)
# if number of components you wish to retain is p then
svd = rm.computeSVD(p, True)
U = svd.U
S = svd.S
V = svd.V

使用Spark在AWS上的python中的大型矩阵的SVD

问题描述

1 个解决方案

解决方案1
0 2017-11-11 21:28:01

使用Spark在AWS上的python中的大型矩阵的SVD

问题描述

1 个解决方案

解决方案1 0 2017-11-11 21:28:01

解决方案1
0 2017-11-11 21:28:01