简体   繁体   中英

Is there a way to join two spark dataframes with custom join for each row

I have two dataframes df and df2 as below

+------+---+----+
|  name|age|city|
+------+---+----+
|  John| 25|  LA|
|  Jane| 26|  LA|
|Joseph| 28|  SA|
+------+---+----+

+---+----+------+
|age|city|salary|
+---+----+------+
| 25|  LA| 40000|
| 26|    | 50000|
|   |  SF| 60000|
+---+----+------+

I want my result dataframe as below

+------+---+----+------+
|  name|age|city|salary|
+------+---+----+------+
|  John| 25|  LA| 40000|
|  Jane| 26|  LA| 50000|
|Joseph| 28|  SF| 60000|
+------+---+----+------+

Basically here I need to join using age, city as join columns but if any one of the column is empty in df2 then I need to join only with the other non null column. The solution I am looking for should be applicable even if there are around 5 columns to join only non null column should participate in the join for each row.

You could give more conditions when you join those dataframes and then select, groupBy would be needed.

df1.join(df2, 
    ($"age" === $"age2" || $"age2".isNull) &&
    ($"city" === $"city2" || $"city2".isNull), "left")
   .show

The result will be:

+------+---+----+----+-----+-------+
|  name|age|city|age2|city2|salary2|
+------+---+----+----+-----+-------+
|  John| 25|  LA|  25|   LA|  40000|
|  Jane| 26|  LA|  26| null|  50000|
|Joseph| 28|  SF|null|   SF|  60000|
+------+---+----+----+-----+-------+

But when you have more columns or the second dataframe has more null values, the result will be more complex.

df1.join(df2,df1.col("age")===df2.col("age") || df1.col("city")===df2.col("city")).select(df1.col("name"),df1.col("age"),df1.col("city"),df2.col("salary")).show
+----+---+----+------+
|name|age|city|salary|
+----+---+----+------+
|john| 25|  LA| 40000|
|Jane| 26|  LA| 40000|
|Jane| 26|  LA| 50000|
+----+---+----+------+```

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM