简体   繁体   English

如何在Scala Spark中将RDD [Double]转换为矢量

[英]How to convert RDD[Double] to Vector in Scala Spark

I have an IndexedRowMatrix of doubles. 我有一个Doubleed的IndexedRowMatrix。 I want to compute the sum of each row of the matrix and save the results to a Vector. 我想计算矩阵每一行的总和并将结果保存到Vector。 After that I want to broadcast this vector. 之后,我要广播此向量。 I am creating an RDD of Doubles, which contains the sums, but I cannot turn it into a vector. 我正在创建一个Doubles的RDD,其中包含总和,但无法将其转换为矢量。 So, the question basically is how to create the Vector I want from the IndexedRowMatrix. 因此,主要的问题是如何从IndexedRowMatrix创建我想要的Vector。

Collect to the driver and construct a vector: 收集给驾驶员并构造一个载体:

import org.apache.spark.mllib.linalg.{Vector, Vectors}

val sc: SparkContext = ???
val rdd: RDD[Double] = ???
val vec: Vector = Vectors.dense(rdd.collect)
val broadcastVec = sc.broadcast(vec)

References: 参考文献:

https://spark.apache.org/docs/2.1.0/mllib-data-types.html#local-vector https://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables https://spark.apache.org/docs/2.1.0/mllib-data-types.html#local-vector https://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM