简体   繁体   中英

How to merge and aggregate 2 Maps in scala most efficiently?

I have the following 2 maps:

val map12:Map[(String,String),Double]=Map(("Sam","0203") -> 16216.0, ("Jam","0157") -> 50756.0, ("Pam","0129") -> 3052.0)
val map22:Map[(String,String),Double]=Map(("Jam","0157") -> 16145.0, ("Pam","0129") -> 15258.0, ("Sam","0203") -> -1638.0, ("Dam","0088") -> -8440.0,("Ham","0104") -> 4130.0,("Hari","0268") -> -108.0, ("Om","0169") -> 5486.0, ("Shiv","0181") -> 275.0, ("Brahma","0148") -> 18739.0)

In the first approach I am using foldLeft to achieve the merging and accumulation:

val t1 = System.nanoTime()
val merged1 = (map12 foldLeft map22)((map22, map12) => map22 + (map12._1 -> (map12._2 + map22.getOrElse(map12._1, 0.0))))
val t2 = System.nanoTime()
println(" First Time taken :"+ (t2-t1))

In the second approach I am trying to use aggregate() function which supports parallel operation:

def merge(map12:Map[(String,String),Double], map22:Map[(String,String),Double]):Map[(String,String),Double]=
  map12 ++ map22.map{case(k, v) => k -> (v + (map12.getOrElse(k, 0.0)))}

val inArr= Array(map12,map22)

val t5 = System.nanoTime()
val mergedNew12 = inArr.par.aggregate(Map[(String,String),Double]())(merge,merge)
val t6 = System.nanoTime()
println(" Second Time taken :"+ (t6-t5))

But I notice the foldLeft is much faster than the aggregate.

I am looking for advice on how to make this operation the most efficient.

If you want an aggregate more efficient by running with par, try with Vector instead of Array, it is one of the best collections for parallel algorithms.

On the other hand, parallel working has some overhead so If you have insufficient data, it will be not convenient.

With the data you gave us, Vector.par.aggregate is better than Array.par.aggregate, but Vector.aggregate is better than foldLeft.

val inVector= Vector(map12,map22)

val t7 = System.nanoTime()
val mergedNew12_2 = inVector.aggregate(Map[(String,String),Double]())(merge,merge)
val t8 = System.nanoTime()
println(" Third Time taken :"+ (t8-t7))

These are my times

First Time taken :6431723
Second Time taken:147474028
Third Time taken :4855489

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM