简体   繁体   中英

Sorting and Merging Using spark-shell

I have a string array in scala

Array[String] = Array(apple, banana, oranges, grapes, lichi, anar)

I have converted it into a format like this:

Array[(Int, String)] = Array((5,apple), (6,banana), (7,oranges), (6,grapes), (5,lichi), (4,anar))

and i want output like this:

Array[(Int, String)] = Array((4,anar), (5,applelichi), (6,bananagrapes), (7,oranges))

means after sorting i want to add together the words with same key. i have done sorting. heres my code:

val a = sc.parallelize(List("apple","banana","oranges","grapes","lichi","anar"))
val b = a.map(x =>(x.length,x))
val c = b.sortBy(_._2)

You can use groupByKey() to do this and then merge the lists you get with mkString . Small example using what you have (a,b are the same):

val c = b.groupByKey().map{case (key, list) => (key, list.toList.sorted.mkString)}.sortBy(_._1)

c.collect() foreach println

Which will give you:

(4,anar)
(5,applelichi)
(6,bananagrapes)
(7,oranges)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM