简体   繁体   中英

Using Case statement in Spark Dataframe in join condition

How can i use case condition while joining two dataframes in spark.

    var date_a = s"CASE WHEN month(to_date(from_unixtime(unix_timestamp(dt, 'dd-MM-yyyy')))) 
    IN  (01,02,03) THEN CONCAT(CONCAT(year(to_date(from_unixtime(unix_timestamp(dt, 'dd-MM-yyyy'))))-1,'-')
    ,substr(year(to_date(from_unixtime(unix_timestamp(dt, 'dd-MM-yyyy')))),3,4)) 
    ELSE CONCAT(CONCAT(year(to_date(from_unixtime(unix_timestamp(dt, 'dd-MM-yyyy')))),'-'),
    SUBSTR(year(to_date(from_unixtime(unix_timestamp(dt, 'dd-MM-yyyy'))))+1,3,4)) END"

    val gstr1_amend = df1.join(gstr1_amend_lkup_data, df1("date_b") === df2(date_a))

But am getting error case is not a column.

Instead of adding case statement in joining condition, add all conditions using when & otherwise functions inside withColumn and then use same column in join condition like below.

val df2 = somedf
.withColumn("date_a",when([...]).otherwise([...])) // [...] is your case statement logic

val gstr1_amend = df1.join(df2, df1("date_b") === df2("date_a"))

I had a similar situation with a minor diff, I wanted to use column from second data frame in case when column from first column is blank, and this is to be done only on joining. Couldn't use a case, however joined on another key column and used case in filter. Isn't elegant solution, but works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM