I need to perform a reduceByKey
on lists. What would be the fastest solution ? I'm using the :::
operator to merge 2 list in the reduce operation, but :::
is O(n) so I am afraid the reduce operation will end up being O(n 2 ) .
Code example :
val rdd: RDD[int, List[int]] = getMyRDD()
rdd.reduceByKey(_ ::: _)
What would be the best/most efficient solution ?
The best you can do is:
rdd.groupByKey.mapValues(_.flatten.toList)
This will:
If you want map-side reduction you can use aggregateByKey
:
import scala.collection.mutable.ArrayBuffer
rdd.aggregateByKey(ArrayBuffer[Int]())(_ ++= _, _ ++= _).mapValues(_.toList)
but usually it will be significantly more expensive compared to the first solution.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.