简体   繁体   English

使用Spark在AWS上的python中的大型矩阵的SVD

[英]SVD of large matrix in python on aws using spark

I am migrating an application to AWS that requires the calculation of a large symmetric, positive-valued matrix, followed by an SVD/eigendecomposition to recover some key eigenvectors and eigenvalues. 我正在将一个应用程序迁移到AWS,该应用程序需要计算一个大的对称正值矩阵,然后进行SVD​​ /特征分解以恢复一些关键特征向量和特征值。 The edge size of the matrix could be 100K or more, so I am looking for a distributed operator in Spark to perform the SVD faster than a straight scipy/numpy svd operator. 矩阵的边缘大小可能为100K或更大,因此我正在Spark中寻找分布式运算符来执行SVD运算,而不是直接scipy / numpy svd运算符。 I am not making the assumption of sparsity. 我不是在假设稀疏。 Can someone advise on how to perform the SVD using Spark? 有人可以建议如何使用Spark执行SVD吗?

Spark version 2.2.0 has a python api for singular value decomposition. Spark版本2.2.0具有用于奇异值分解的python API。

# convert your rdd into RowMatrix
rm = RowMatrix(data_rdd)
# if number of components you wish to retain is p then
svd = rm.computeSVD(p, True)
U = svd.U
S = svd.S
V = svd.V

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM