[英]spark group by not getting type, mismatch error
I am trying to get this variable GroupsByP
to have certain type: GroupsByP
is defined out of db connection select/collect statement which has 3 fields: 2 strings ( p
and id
) and an int ( order
). 我正在尝试使此变量
GroupsByP
具有某种类型: GroupsByP
是从具有3个字段的数据库连接选择/收集语句中定义的:2个字符串( p
和id
)和一个int( order
)。
Expected result should be of the form Map[p,Set[(Id,Order)]]
预期结果的格式应为
Map[p,Set[(Id,Order)]]
val GroupsByP = db.pLinkGroups.collect()
.groupBy(_.p)
.map(group => group._1 -> (group._2.map(_.Id -> group._2.map(_.Order)).toSet))
my desired type for this variable is 我想要的此变量的类型是
Map[String, Set[(String, Int)]]
but actual is Map[String, Set[(String, Array[Int])]],
但实际是
Map[String, Set[(String, Array[Int])]],
If I got your question right, this should do it: 如果我的问题正确无误,则应这样做:
val GroupsByP: Map[String, Set[(String, Int)]] = input.collect()
.groupBy(_.p)
.map(group => group._1 -> group._2.map(record => (record.Id, record.Order)).toSet)
You should be mapping each record into a (Id, Order)
tuple. 您应该将每条记录映射到一个
(Id, Order)
元组。
A very similar but perhaps more readable implementation might be: 一个非常相似但也许更具可读性的实现可能是:
val GroupsByP: Map[String, Set[(String, Int)]] = input.collect()
.groupBy(_.p)
.mapValues(_.map(record => (record.Id, record.Order)).toSet)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.