[英]Finding Maximum in Key Value RDD
I have a key-value RDD of the form :我有一个键值 RDD 的形式:
(Some(23661587),
CompactBuffer(Posting(2,23661643,Some(23661587),0,None),
Posting(2,23661682,Some(23661587),0,None)))
Here Some(23661587)
is the key and data inside CompactBuffer
is the value.这里
Some(23661587)
是键, CompactBuffer
里面的数据是值。 I want to select the Posting
type with maximum value for a particular attribute for each key.我想为每个键的特定属性选择具有最大值的
Posting
类型。
How can I do that?我怎样才能做到这一点? I have limited experience in Scala and Spark.
我在 Scala 和 Spark 方面的经验有限。 Thanks
谢谢
I reproduced your example with some data.我用一些数据复制了你的例子。
As @sinanspd said, org.apache.spark.util.collection.CompactBuffer
extends from scala.collection.immutable.Seq
, you can follow this link CompactBuffer , so you can use methods from scala.collection.immutable.Seq
Seq to sort the Seq and get the Posting
max value.正如@sinanspd 所说,
org.apache.spark.util.collection.CompactBuffer
从scala.collection.immutable.Seq
扩展,您可以点击此链接CompactBuffer ,因此您可以使用scala.collection.immutable.Seq
Seq 中的方法对Seq 并获得Posting
最大值。
My choice was Posting.value
to sort the Seq but it could be value2 or any field in Posting class.我的选择是
Posting.value
对 Seq 进行排序,但它可以是 value2 或 Posting 类中的任何字段。
As an example举个例子
object FindingMaximum {
val spark = SparkSession
.builder()
.appName("FindingMaximum")
.master("local[*]")
.getOrCreate()
val sc = spark.sparkContext
case class Posting(key: Int, value: Long, value2: Option[Long], value3: Int, value4: Option[Int])
val data = List((Some(23661587),Seq(Posting(2,23661643,Some(23661587),0,None), Posting(2,23661682,Some(23661587),0,None))),
(Some(23661588),Seq(Posting(3,23661743,Some(23661588),0,None), Posting(3,23661682,Some(23661588),0,None))),
(Some(23661589),Seq(Posting(4,23661843,Some(23661589),0,None), Posting(4,23661882,Some(23661589),0,None))))
def main(args: Array[String]): Unit = {
sc.setLogLevel("ERROR")
val rdd = sc.parallelize(data)
val rddKeyMax = rdd.map({case(key, v) =>
val max = v.sortBy(posting => posting.value).last
(key, max)
})
rddKeyMax.foreach(println)
}
}
/*
(Some(23661588),Posting(3,23661743,Some(23661588),0,None))
(Some(23661587),Posting(2,23661682,Some(23661587),0,None))
(Some(23661589),Posting(4,23661882,Some(23661589),0,None))
*/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.