简体   繁体   English

如何在spark中将矩阵转换为RDD [Vector]

[英]How to convert matrix to RDD[Vector] in spark

How to convert from org.apache.spark.mllib.linalg.Matrix to RDD[org.apache.spark.mllib.linalg.Vector] in Spark? 如何转换org.apache.spark.mllib.linalg.MatrixRDD[org.apache.spark.mllib.linalg.Vector]火花?

The matrix is generated from SVD, and I am using the results from SVD to do clustering analysis. 矩阵是从SVD生成的,我使用SVD的结果进行聚类分析。

MLlib's Matrix is a small local matrix. MLlib的Matrix是一个小的局部矩阵。 It would probably be more efficient to analyze it locally instead of turning it into an RDD. 在本地分析它而不是将其转换为RDD可能更有效。

Anyway, if your clustering only supports RDD as its input, here's how you can do the transformation: 无论如何,如果您的群集仅支持RDD作为其输入,那么您可以通过以下方式进行转换:

import org.apache.spark.mllib.linalg._
def toRDD(m: Matrix): RDD[Vector] = {
  val columns = m.toArray.grouped(m.numRows)
  val rows = columns.toSeq.transpose // Skip this if you want a column-major RDD.
  val vectors = rows.map(row => new DenseVector(row.toArray))
  sc.parallelize(vectors)
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM