I have an RDD[(VertexId, Double)]
, and I want to sort it by _._2
and join the index(rank) with this RDD. Therefore I can get an element and its rank by filter
.
Currently I sort the RDD by sortBy
, but I do not know how to join a RDD with its rank. So I collect it as a sequence and zip it with its index. But this is not efficient. I am wondering if there is a more elegant way to do that.
The code I'm using right now are:
val tmpRes = graph.vertices.sortBy(_._2, ascending = false) // Sort all nodes by its PR score in descending order
.collect() // collect to master, this may be very expensive
tmpRes.zip(tmpRes.indices) // zip with index
if, by any chance, you'd like to bring back to the driver only n first tuples, then maybe you could use takeOrdered(n, [ordering]) where n is the number of results to bring back and ordering the comparator you'd like to use.
Otherwise, you can use the zipWithIndex transformation that will transform you RDD[(VertexId, Double)]
into a RDD[((VertexId, Double), Long)]
with the proper index (of course you should do that after your sort).
For example :
scala> val data = sc.parallelize(List(("A", 1), ("B", 2)))
scala> val sorted = data.sortBy(_._2)
scala> sorted.zipWithIndex.collect()
res1: Array[((String, Int), Long)] = Array(((A,1),0), ((B,2),1))
Regards,
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.