简体   繁体   中英

How to Sum a part of a list in RDD

I have an RDD, and I would like to sum a part of the list.

(key, element2 + element3)
(1, List(2.0, 3.0, 4.0, 5.0)), (2, List(1.0, -1.0, -2.0, -3.0))

output should look like this,

(1, 7.0), (2, -3.0)

Thanks

You can map and indexing on the second part:

yourRddOfTuples.map(tuple => {val list = tuple._2; list(1) + list(2)})

Update after your comment, convert it to Vector :

yourRddOfTuples.map(tuple => {val vs = tuple._2.toVector; vs(1) + vs(2)})

Or if you do not want to use conversions:

yourRddOfTuples.map(_._2.drop(1).take(2).sum)

This skips the first element ( .drop(1) ) from the second element of the tuple ( .map(_._2 ), takes the next two ( .take(2) ) (might be less if you have less) and sums them ( .sum ).

You can map the key-list pair to obtain the 2nd and 3rd list elements as follows:

val rdd = sc.parallelize(Seq(
  (1, List(2.0, 3.0, 4.0, 5.0)),
  (2, List(1.0, -1.0, -2.0, -3.0))
))

rdd.map{ case (k, l) => (k, l(1) + l(2)) }.collect
// res1: Array[(Int, Double)] = Array((1,7.0), (2,-3.0))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM