[英]How do I sum a set of vectors and produce a new vector in Spark
I am using Spark's Java API, and read a lot of data with following schema: 我正在使用Spark的Java API,并使用以下模式读取大量数据:
profits (Array of Double values):
---------------------------------
[1.0,2.0,3.0]
[2.0,3.0,4.0]
[4,0,6.0]
Once I have a dataframe, I want to compute a new vector which is the sum of all the vectors: 有了数据框后,我想计算一个新的向量,它是所有向量的总和:
Result:
[7.0,11.0,7.0]
I see some examples online on doing this in Scala and Python, but nothing for Java. 我在网上看到了一些在Scala和Python中执行此操作的示例,但对于Java没有任何示例。
val withIndex = profits.zipWithIndex // ((a,0),(b,1),(c,2))
We need to use the index as key: 我们需要使用索引作为键:
val indexKey = withIndex.map{case (k,v) => (v,k)} //((0,a),(1,b),(2,c))
Finallly, 最后,
counts = indexKey.reduceByKey(lambda k, v: k + v)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.