简体   繁体   中英

How to sort the RDD and get top N elements using scala?

I have RDD of a case class(TopNModel) and want to get top N elements from giving RDD where sort by tx + rx. In case of two equal (tx + rx) sort by mac.

case class TopNModel(mac: Long, tx: Int, rx: Int)

For example:

RDD[TopNModel(10L, 200, 100), TopNModel(12L, 100, 100), TopNModel(1L, 200, 400), TopNModel(11L, 100, 200)]

sort by tx + rx and mac:

RDD[TopNModel(1L, 200, 400), TopNModel(10L, 200, 100), TopNModel(11L, 100, 200), TopNModel(12L, 100, 100)]

My Question:

  1. How to sort if rx + tx values are the same then sort based on mac?

EDIT : per important comment below, if indeed the requirement is to "get top N" entities based on this order, sortBy is wasteful compared to takeOrdered . Use the second solution ("alternative") with takeOrdered .


You can use the fact that tuples are naturally-ordered from "leftmost" argument to right, and create a tuple with the negative value of tx + rx (so that these are sorted in decending order) and the positive value of mac :

val result = rdd.sortBy { case TopNModel(mac, tx, rx) => (-(tx + rx), mac) }

Alternatively , if you want TopNModel to always be sorted this way (no matter the context), you can make it an Ordered and implement its compare method. Then, sorting by identity will use that compare to get the same result:

case class TopNModel(mac: Long, tx: Int, rx: Int) extends Ordered[TopNModel] {
  import scala.math.Ordered.orderingToOrdered
  def compare(that: TopNModel): Int = (-(tx + rx), mac) compare (-(that.tx + that.rx), that.mac)
}

val result = rdd.sortBy(identity)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM