簡體   English   中英

按火花對RDD中的值排序

[英]Order by value in spark pair RDD

我有一個火花對RDD(鍵,計數)如下

Array[(String, Int)] = Array((a,1), (b,2), (c,1), (d,3))

使用spark scala API如何獲取按值排序的新對RDD?

必需結果: Array((d,3), (b,2), (a,1), (c,1))

這應該工作:

//Assuming the pair's second type has an Ordering, which is the case for Int
rdd.sortBy(_._2) // same as rdd.sortBy(pair => pair._2)

(雖然你可能也想在有關系的時候把鑰匙交給賬戶。)

按鍵和值按升序和降序排序

val textfile = sc.textFile("file:///home/hdfs/input.txt")
val words = textfile.flatMap(line => line.split(" "))
//Sort by value in descending order. For ascending order remove 'false' argument from sortBy
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2,false)
//for ascending order by value
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortBy(_._2)

//Sort by key in ascending order
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey
//Sort by key in descending order
words.map( word => (word,1)).reduceByKey((a,b) => a+b).sortByKey(false)

這可以通過在交換鍵和值之后應用sortByKey以另一種方式完成

//Sort By value by swapping key and value and then using sortByKey
val sortbyvalue = words.map( word => (word,1)).reduceByKey((a,b) => a+b)
val descendingSortByvalue = sortbyvalue.sortByKey(false).map(x => (x._2,x._1))
descendingSortByvalue.toDF.show
descendingSortByvalue.foreach {n => {
val word=  n._1
val count = n._2
println(s"$word:$count")}}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM