简体   繁体   中英

Spark - Best way agg two values using ReduceByKey

Using Spark, I have a pair RDD[(String, (Int, Int)] . I am trying to find the best way to show multiple sums per key (in this case the sum of each Int shown seperately). I would like to do this with reduceByKey .

Is this possible?

Sure.

val rdd = sc.parallelize(Array(("foo", (1, 10)), ("foo", (2, 2)), ("bar", (5, 5))))
val res = rdd.reduceByKey((p1, p2) => (p1._1 + p2._1, p1._2 + p2._2))
res.collect()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM