简体   繁体   中英

Map over CompactBuffer in an rdd

I have an RDD which was groupByKey as below,

(1, CompactBuffer(2.0, 3.0, 4.0)), (2, CompactBuffer(1.0, -1.0, -2.0))

And I wish to mapValues into (1*x_1^2, 2*x_2^2, 3*x_3^2)

which should look like this,

(1, CompactBuffer(4.0, 18.0, 48.0)), (2, CompactBuffer(1.0, 2.0, 12.0))

What should I do?

Thanks for your help.

You can use mapValues to process the CompactBuffer content using zip with a Stream(1, 2, ...) , like in the following:

val rdd = sc.parallelize(Seq(
  (1, 2.0),
  (1, 3.0),
  (1, 4.0),
  (2, 1.0),
  (2, -1.0),
  (2, -2.0)
))

val groupedRDD = rdd.groupByKey
// res1: Array[(Int, Iterable[Double])] = Array(
//   (1,CompactBuffer(2.0, 3.0, 4.0)), (2,CompactBuffer(1.0, -1.0, -2.0))
// )

groupedRDD.mapValues( l =>
  l.zip(Stream from 1).map{ case (v, i) => v * v * i }
)
// res2: Array[(Int, Iterable[Double])] = Array(
//   (1,List(4.0, 18.0, 48.0)), (2,List(1.0, 2.0, 12.0))
// )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM