简体   繁体   中英

Join Dataframes dynamically using Spark Scala when JOIN columns differ

Dynamically select multiple columns while joining different Dataframe in scala spark

From the above link , I was able to have the join expression working , but what if the column names are different, we cannot use Seq(columns) and need to join it dynamically. Here left_ds and right_ds are the dataframes which I wanted to join. Below I want to join columns id=acc_id and "acc_no=number"

left_da => id,acc_no,name,ph

right_ds => acc_id,number,location

val joinKeys="id,acc_id|acc_no,number"
val joinKeyPair: Array[(String, String)] = joinKeys.split("\\|").map(_.split(",")).map(x => x(0).toUpperCase -> x(1).toUpperCase)

val joinExpr: Column = joinKeyPair.map { case (ltable_col, rtable_col) =>left_ds.col(ltable_col) === right_ds.col(rtable_col)}.reduce(_ and _)

left_ds.join(right_ds, joinExpr, "left_outer")

Above is the join expression I was trying but it not working. Is there a way to achieve this if the join column names are different with out using Seq. So if the number of join keys increase ,I should still be able to make the code work dynamically.

With aliases have to work fine:

val conditionArrays = joinKeys.split("\\|").map(c => c.split(","))
val joinExpr = conditionArrays.map { case Array(a, b) => col("a." + a) === col("b." + b) }.reduce(_ and _)
left_ds.alias("a").join(right_ds.alias("b"), joinExpr, "left_outer")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM