I need to check a Condition whether if ReasonCode is "YES" , then use ProcessDate as one of the PARTITION column else do not.
The equivalent SQL query is below:
SELECT PNum, SUM(SIAmt) OVER (PARTITION BY PNum,
ReasonCode ,
CASE WHEN ReasonCode = 'YES' THEN ProcessDate ELSE NULL END
ORDER BY ProcessDate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) SumAmt
from TABLE1
I have tried so far the below query, but unable to incorporate the condition
"CASE WHEN ReasonCode = 'YES' THEN ProcessDate ELSE NULL END" in Spark Dataframes
val df = inputDF.select("PNum")
.withColumn("SumAmt", sum("SIAmt").over(Window.partitionBy("PNum","ReasonCode").orderBy("ProcessDate")))
Input Data:
---------------------------------------
Pnum ReasonCode ProcessDate SIAmt
---------------------------------------
1 No 1/01/2016 200
1 No 2/01/2016 300
1 Yes 3/01/2016 -200
1 Yes 4/01/2016 200
---------------------------------------
Expected Output:
---------------------------------------------
Pnum ReasonCode ProcessDate SIAmt SumAmt
---------------------------------------------
1 No 1/01/2016 200 200
1 No 2/01/2016 300 500
1 Yes 3/01/2016 -200 -200
1 Yes 4/01/2016 200 200
---------------------------------------------
Any Suggestion/help on Spark dataframe instead of spark-sql query ?
You can apply the same exact copy of SQL in api form as
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
val df = inputDF
.withColumn("SumAmt", sum("SIAmt").over(Window.partitionBy(col("PNum"),col("ReasonCode"), when(col("ReasonCode") === "Yes", col("ProcessDate")).otherwise(null)).orderBy("ProcessDate")))
You can add the .rowsBetween(Long.MinValue, 0)
part too, which should give you
+----+----------+-----------+-----+------+
|Pnum|ReasonCode|ProcessDate|SIAmt|SumAmt|
+----+----------+-----------+-----+------+
| 1| Yes| 4/01/2016| 200| 200|
| 1| No| 1/01/2016| 200| 200|
| 1| No| 2/01/2016| 300| 500|
| 1| Yes| 3/01/2016| -200| -200|
+----+----------+-----------+-----+------+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.