[英]Union two DataFrames of different types in Spark
In my recent project, I need to union two dataframes of different sizes.在我最近的项目中,我需要联合两个不同大小的数据框。 For example:
例如:
Here is my sample data:这是我的示例数据:
df1: df1:
name number address
kevin 101 NZ
gevin 102 CA
here all the fields are of type String.这里所有的字段都是字符串类型。
df2: df2:
name number address
kevin [101,102] NZ
gevin [102,103] CA
Here name and address are type string and number is of type array<string>
.这里 name 和 address 是 string 类型,number 是
array<string>
类型。
Now I need to union these two dataframes.现在我需要联合这两个数据框。 My expexcted outcome is like:
我预期的结果是这样的:
name number address
kevin 101 NZ
gevin 102 CA
kevin [101,102] NZ
gevin [102,103] CA
final df types should be same as the df2(string, array, string).最终 df 类型应与 df2(string, array, string) 相同。
You can convert to array
for the first dataframe as well and union
both dataframe.您可以转换到
array
的第一数据帧,以及和union
双方数据帧。
import org.apache.spark.sql.functions._
df1.withColumn("number", array($"number"))
.union(df2)
Output:输出:
+-----+----------+-------+
|name |number |address|
+-----+----------+-------+
|kevin|[101] |NZ |
|gevin|[102] |CZ |
|kevin|[101, 102]|NZ |
|gevin|[102, 103]|CZ |
+-----+----------+-------+
Hope this helps!希望这有帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.