Using Spark, I have a pair RDD[(String, (Int, Int)]
. I am trying to find the best way to show multiple sums per key (in this case the sum of each Int
shown seperately). I would like to do this with reduceByKey
.
Is this possible?
Sure.
val rdd = sc.parallelize(Array(("foo", (1, 10)), ("foo", (2, 2)), ("bar", (5, 5))))
val res = rdd.reduceByKey((p1, p2) => (p1._1 + p2._1, p1._2 + p2._2))
res.collect()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.