简体   繁体   中英

How to convert RDD[Double] to Vector in Scala Spark

I have an IndexedRowMatrix of doubles. I want to compute the sum of each row of the matrix and save the results to a Vector. After that I want to broadcast this vector. I am creating an RDD of Doubles, which contains the sums, but I cannot turn it into a vector. So, the question basically is how to create the Vector I want from the IndexedRowMatrix.

Collect to the driver and construct a vector:

import org.apache.spark.mllib.linalg.{Vector, Vectors}

val sc: SparkContext = ???
val rdd: RDD[Double] = ???
val vec: Vector = Vectors.dense(rdd.collect)
val broadcastVec = sc.broadcast(vec)

References:

https://spark.apache.org/docs/2.1.0/mllib-data-types.html#local-vector https://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM