[英]Spark - aggregateByKey Type mismatch error
I am trying find the problem behind this.我试图找到这背后的问题。 I am trying to find the maximum number Marks of each student using
aggregateByKey
.我正在尝试使用
aggregateByKey
找到每个学生的最大标记数。
val data = spark.sc.Seq(("R1","M",22),("R1","E",25),("R1","F",29),
("R2","M",20),("R2","E",32),("R2","F",52))
.toDF("Name","Subject","Marks")
def seqOp = (acc:Int,ele:(String,Int)) => if (acc>ele._2) acc else ele._2
def combOp =(acc:Int,acc1:Int) => if(acc>acc1) acc else acc1
val r = data.rdd.map{case(t1,t2,t3)=> (t1,(t2,t3))}.aggregateByKey(0)(seqOp,combOp)
I am getting error that aggregateByKey
accepts (Int,(Any,Any))
but actual is (Int,(String,Int))
.我收到错误,
aggregateByKey
接受(Int,(Any,Any))
但实际是(Int,(String,Int))
。
Your map function is incorrect since you have a Row
as input, not a Tuple3
您的地图功能不正确,因为您有一个
Row
作为输入,而不是Tuple3
Fix the last line with :修复最后一行:
val r = data.rdd.map { r =>
val t1 = r.getAs[String](0)
val t2 = r.getAs[String](1)
val t3 = r.getAs[Int](2)
(t1,(t2,t3))
}.aggregateByKey(0)(seqOp,combOp)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.