简体   繁体   中英

Join a DF with another two with conditions - Scala Spark

I am trying to join a DF with another two using a condition. I have the following DF's. DF1 , the DF that I want to join with df_cond1 and df_cond2 .

If DF1 InfoNum col is NBC I want to join with df_cond1 else if DF1 InfoNum Column is BBC I want to join with df_cond2 but I don't know how can I do this.

DF1
+-------------+----------+-------------+
|  Date       | InfoNum  |   Sport     |
+-------------+----------+-------------+
|  31/11/2020 |   NBC    |  football   | 
|  11/01/2020 |   BBC    |  tennis     |
+-------------+----------+-------------+

df_cond1
+-------------+---------+-------------+
| Periodicity |   Info  | Description |
+-------------+---------+-------------+
|  Monthly    |  NBC    | DATAquality |
+-------------+---------+-------------+

df_cond2
+-------------+---------+-------------+
| Periodicity |   Info  | Description |
+-------------+---------+-------------+
|  Daily      |  BBC    | InfoIndeed  |
+-------------+---------+-------------+

final_df
+-------------+----------+-------------+-------------+
|  Date       | InfoNum  |   Sport     | Description |
+-------------+----------+-------------+-------------+
|  31/11/2020 |   NBC    |  football   | DATAquality | 
|  11/01/2020 |   BBC    |  tennis     | InfoIndeed  |
+-------------+----------+-------------+-------------+

I have been searching but didn't find a good solution, can you help me?

Here is how you can join

val df = Seq(
  ("31/11/2020", "NBC", "football"),
  ("1/01/2020", "BBC", "tennis")
).toDF("Date", "InfoNum", "Sport")

val df_cond1 = Seq(
  ("Monthly", "NBC", "DATAquality")
).toDF("Periodicity", "Info", "Description")

val df_cond2 = Seq(
  ("Daily", "BBC", "InfoIndeed")
).toDF("Periodicity", "Info", "Description")

df.join(df_cond1.union(df_cond2), $"InfoNum" === $"Info")
  .drop("Info", "Periodicity")
  .show(false)

Output:

+----------+-------+--------+-----------+
|Date      |InfoNum|Sport   |Description|
+----------+-------+--------+-----------+
|31/11/2020|NBC    |football|DATAquality|
|1/01/2020 |BBC    |tennis  |InfoIndeed |
+----------+-------+--------+-----------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM