简体   繁体   English

Scala - 连接数据帧关系 1 到 n

[英]Scala - Join dataframes relation 1 to n

I have two dataframes我有两个数据框

Dataframe - House(adress, number, zipcode) Dataframe - 房屋(地址、号码、邮政编码)

adress 1, 28, 04030
adress 2, 01, 25040

Dataframe - People(name, adress, age) Dataframe - 人(姓名、地址、年龄)

Miki , adress 1, 15
Sterling , adress 2, 20
Archer, adress 2, 25

I need to join both of them into a third dataframe - Filled_HouseHouse(adress, number, zipcode, member1, member2, member3, member4) like我需要将它们都加入到第三个 dataframe - Filled_HouseHouse(address, number, zipcode, member1, member2, member3, member4) 之类的

 adress 1, 28, 04030, Miki, null, null, null
 adress 2, 01, 25040, Sterling, Archer, null, null

In Scala+Spark I believe using map and group by could be the answer, but I did not figure the proper way out.Scala+Spark中,我相信使用 map 和 group by 可能是答案,但我没有找到正确的出路。

Thanks for your time!谢谢你的时间!

Using使用

val peopleUnified = people.groupBy("address").agg(collect_list("name")

I got我有

adress 1, Miki
adress 2, [Sterling, Archer]

So next step is splitting the created list and filling the member fields with the join所以下一步是拆分创建的列表并用连接填充成员字段

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM