How can i use case condition while joining two dataframes in spark.
var date_a = s"CASE WHEN month(to_date(from_unixtime(unix_timestamp(dt, 'dd-MM-yyyy'))))
IN (01,02,03) THEN CONCAT(CONCAT(year(to_date(from_unixtime(unix_timestamp(dt, 'dd-MM-yyyy'))))-1,'-')
,substr(year(to_date(from_unixtime(unix_timestamp(dt, 'dd-MM-yyyy')))),3,4))
ELSE CONCAT(CONCAT(year(to_date(from_unixtime(unix_timestamp(dt, 'dd-MM-yyyy')))),'-'),
SUBSTR(year(to_date(from_unixtime(unix_timestamp(dt, 'dd-MM-yyyy'))))+1,3,4)) END"
val gstr1_amend = df1.join(gstr1_amend_lkup_data, df1("date_b") === df2(date_a))
But am getting error case is not a column.
Instead of adding case statement
in joining condition, add all conditions using when & otherwise
functions inside withColumn
and then use same column in join condition like below.
val df2 = somedf
.withColumn("date_a",when([...]).otherwise([...])) // [...] is your case statement logic
val gstr1_amend = df1.join(df2, df1("date_b") === df2("date_a"))
I had a similar situation with a minor diff, I wanted to use column from second data frame in case when column from first column is blank, and this is to be done only on joining. Couldn't use a case, however joined on another key column and used case in filter. Isn't elegant solution, but works.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.