简体   繁体   English

在 Spark 中联合两个不同类型的 DataFrame

[英]Union two DataFrames of different types in Spark

In my recent project, I need to union two dataframes of different sizes.在我最近的项目中,我需要联合两个不同大小的数据框。 For example:例如:

Here is my sample data:这是我的示例数据:

df1: df1:

name    number    address
kevin   101        NZ
gevin   102        CA

here all the fields are of type String.这里所有的字段都是字符串类型。

df2: df2:

name    number    address
kevin   [101,102]    NZ
gevin   [102,103]    CA

Here name and address are type string and number is of type array<string> .这里 name 和 address 是 string 类型,number 是array<string>类型。

Now I need to union these two dataframes.现在我需要联合这两个数据框。 My expexcted outcome is like:我预期的结果是这样的:

name    number    address
kevin   101         NZ
gevin   102         CA
kevin   [101,102]   NZ
gevin   [102,103]   CA

final df types should be same as the df2(string, array, string).最终 df 类型应与 df2(string, array, string) 相同。

You can convert to array for the first dataframe as well and union both dataframe.您可以转换到array的第一数据帧,以及和union双方数据帧。

import org.apache.spark.sql.functions._

df1.withColumn("number", array($"number"))
   .union(df2)

Output:输出:

+-----+----------+-------+
|name |number    |address|
+-----+----------+-------+
|kevin|[101]     |NZ     |
|gevin|[102]     |CZ     |
|kevin|[101, 102]|NZ     |
|gevin|[102, 103]|CZ     |
+-----+----------+-------+

Hope this helps!希望这有帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM