简体   繁体   English

合并具有嵌套不同架构的两个数据框

[英]union two dataframes with nested different schemas

Dataframe1 looks like this Dataframe1看起来像这样

root
 |-- source: string (nullable = true)
 |-- results: array (nullable = true)
 |    |-- content: struct (containsNull = true)
 |    |    |-- ptype: string (nullable = true)
 |    |    |-- domain: string (nullable = true)
 |    |    |-- verb: string (nullable = true)
 |    |    |-- foobar: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- fooId: integer (nullable = true)
 |-- date: string (nullable = false)
 |-- hour: string (nullable = false)

Dataframe 2 look like below: 数据框2如下所示:

root
 |-- source: string (nullable = true)
 |-- results: array (nullable = true)
 |    |-- content: struct (containsNull = true)
 |    |    |-- ptype: string (nullable = true)
 |    |    |-- domain: string (nullable = true)
 |    |    |-- verb: string (nullable = true)
 |    |    |-- foobar: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |-- date: string (nullable = false)
 |-- hour: string (nullable = false)

Notice the differnce - there is no fooId in the second dataframe. 注意差异-第二个数据帧中没有fooId How can I union these two dataframes together? 如何将这两个数据框合并在一起? I understand that the two schemas need to be the same to union. 我了解这两个架构需要相同才能合并。 What is the best way to add fooId or remove fooId ?(non trivial because of the structure of the schema) What is the recommended approach for doing union of this kind. 添加fooId或删除fooId的最佳方法是什么?(由于架构的结构,这是不琐碎的)建议进行这种联合的方法是什么。 Thanks 谢谢

As you considered two Dataframes let DF1 and DF2, You could remove the extra column in the DF1 and run a untion of both the dataframes 当您考虑了让DF1和DF2使用的两个数据框时,您可以删除DF1中的多余列并同时运行两个数据框

// this is to remove the extra column in the dataframe
DF1.drop("fooId")

Now both the DFs has the same number of columns so you can do a union 现在两个DF的列数相同,因此您可以进行并集

DF1.union(DF2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM