简体   繁体   中英

How do I sum a set of vectors and produce a new vector in Spark

I am using Spark's Java API, and read a lot of data with following schema:

profits (Array of Double values):
--------------------------------- 
[1.0,2.0,3.0] 
[2.0,3.0,4.0] 
[4,0,6.0]

Once I have a dataframe, I want to compute a new vector which is the sum of all the vectors:

Result:
[7.0,11.0,7.0]

I see some examples online on doing this in Scala and Python, but nothing for Java.

val withIndex = profits.zipWithIndex // ((a,0),(b,1),(c,2))

We need to use the index as key:

val indexKey = withIndex.map{case (k,v) => (v,k)}  //((0,a),(1,b),(2,c))

Finallly,

counts = indexKey.reduceByKey(lambda k, v: k + v)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM