如何使用 if else 条件合并 2 个 Spark 数据帧

Question

How can we merge 2 dataframes and form a new data using conditions.for eg.我们如何合并 2 个数据帧并使用 conditions.for 形成一个新数据，例如。 if data is present in dataframe B , use the row from dataframe B else use data from dataframe A.如果数据帧 B 中存在数据，则使用数据帧 B 中的行，否则使用数据帧 A 中的数据。

DataFrame A数据帧 A

+-----+-------------------+--------+------+
| Name|           LastTime|Duration|Status|
+-----+-------------------+--------+------+
|  Bob|2015-04-23 12:33:00|       1|logout|
|Alice|2015-04-20 12:33:00|       5| login|
+-----+-------------------+--------+------+

DataFrame B数据帧 B

+-----+-------------------+--------+------+
| Name|           LastTime|Duration|Status|
+-----+-------------------+--------+------+
|  Bob|2015-04-24 00:33:00|       1|login |
+-----+-------------------+--------+------+

I want to form a new dataframe by using whole data in Dataframe A but update rows using data in B我想通过使用数据帧 A 中的整个数据来形成一个新的数据帧，但使用 B 中的数据更新行

+-----+-------------------+--------+------+
| Name|           LastTime|Duration|Status|
+-----+-------------------+--------+------+
|  Bob|2015-04-24 00:33:00|       1|login |
|Alice|2015-04-20 12:33:00|       5| login|
+-----+-------------------+--------+------+

I tried full outer join as我尝试了完全外连接

val joined = df.as("a").join(df.as("b")).where($"a.name" === $"b.name","outer")

But it resulted in 1 row with duplicate columns.How can I ignore the row in first table if there is one corresponding row is present in second.但它导致 1 行有重复的列。如果第二个表中有一个对应的行，我怎么能忽略第一个表中的行。

Answer 1

val combine_df = dfa.join(dfb,Seq("Name"),"right").select(dfa("Name"),coalesce(dfa("LastTime"),dfb("LastTime")),coalesce(dfa) ("持续时间"), dfb("持续时间")),coalesce(dfa("状态"), dfb("状态")))

如何使用 if else 条件合并 2 个 Spark 数据帧

问题描述

1 个解决方案

解决方案1
1 2018-04-06 04:08:31

如何使用 if else 条件合并 2 个 Spark 数据帧

问题描述

1 个解决方案

解决方案1 1 2018-04-06 04:08:31

解决方案1
1 2018-04-06 04:08:31