[英]Scala - Join dataframes relation 1 to n
I have two dataframes我有两个数据框
Dataframe - House(adress, number, zipcode) Dataframe - 房屋(地址、号码、邮政编码)
adress 1, 28, 04030
adress 2, 01, 25040
Dataframe - People(name, adress, age) Dataframe - 人(姓名、地址、年龄)
Miki , adress 1, 15
Sterling , adress 2, 20
Archer, adress 2, 25
I need to join both of them into a third dataframe - Filled_HouseHouse(adress, number, zipcode, member1, member2, member3, member4) like我需要将它们都加入到第三个 dataframe - Filled_HouseHouse(address, number, zipcode, member1, member2, member3, member4) 之类的
adress 1, 28, 04030, Miki, null, null, null
adress 2, 01, 25040, Sterling, Archer, null, null
In Scala+Spark I believe using map and group by could be the answer, but I did not figure the proper way out.在Scala+Spark中,我相信使用 map 和 group by 可能是答案,但我没有找到正确的出路。
Thanks for your time!谢谢你的时间!
Using使用
val peopleUnified = people.groupBy("address").agg(collect_list("name")
I got我有
adress 1, Miki
adress 2, [Sterling, Archer]
So next step is splitting the created list and filling the member fields with the join所以下一步是拆分创建的列表并用连接填充成员字段
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.