简体   繁体   English

通过不获取类型,不匹配错误来生成火花组

[英]spark group by not getting type, mismatch error

I am trying to get this variable GroupsByP to have certain type: GroupsByP is defined out of db connection select/collect statement which has 3 fields: 2 strings ( p and id ) and an int ( order ). 我正在尝试使此变量GroupsByP具有某种类型: GroupsByP是从具有3个字段的数据库连接选择/收集语句中定义的:2个字符串( pid )和一个int( order )。

Expected result should be of the form Map[p,Set[(Id,Order)]] 预期结果的格式应为Map[p,Set[(Id,Order)]]

val GroupsByP = db.pLinkGroups.collect()
  .groupBy(_.p)
  .map(group => group._1 -> (group._2.map(_.Id -> group._2.map(_.Order)).toSet))

my desired type for this variable is 我想要的此变量的类型是

Map[String, Set[(String, Int)]]

but actual is Map[String, Set[(String, Array[Int])]], 但实际是Map[String, Set[(String, Array[Int])]],

If I got your question right, this should do it: 如果我的问题正确无误,则应这样做:

 val GroupsByP: Map[String, Set[(String, Int)]] = input.collect()
    .groupBy(_.p)
    .map(group => group._1 -> group._2.map(record => (record.Id, record.Order)).toSet)

You should be mapping each record into a (Id, Order) tuple. 您应该将每条记录映射到一个(Id, Order)元组。

A very similar but perhaps more readable implementation might be: 一个非常相似但也许更具可读性的实现可能是:

val GroupsByP: Map[String, Set[(String, Int)]] = input.collect()
    .groupBy(_.p)
    .mapValues(_.map(record => (record.Id, record.Order)).toSet)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM