sql With partition use in spark sql dataframe query

Question

I have a sql query as such:

WITH cte AS
(
   SELECT *,
         ROW_NUMBER() OVER (PARTITION BY [date] ORDER BY TradedVolumSum DESC) AS rn
   FROM tempTrades
)
SELECT *
FROM cte
WHERE rn = 1

and I want to use it in spark sql to query my dataframe.

my dataframe looks like:

and I want to have only the maximum of tradedVolumSum for each day with the SecurityDescription. so I want to see something like:

how would I simulate same behaviour in spark sql in python?

Thanks!

Answer 1

Below is the code for your problem assuming your data frame name is tempTrades:

import pyspark.sql.functions as F
from pyspark.sql import Window

win_temp = Window.partitionBy(F.col("[date]")).orderBy(F.col("TradedVolumSum").desc())
tempTrades.withColumn(
    "rn",
    F.row_number().over(win_temp)
).filter(
    F.col("rn") == 1
)

sql With partition use in spark sql dataframe query

Question

1 answers

solution1
1 ACCPTED 2018-09-02 16:52:52

sql With partition use in spark sql dataframe query

Question

1 answers

solution1 1 ACCPTED 2018-09-02 16:52:52

solution1
1 ACCPTED 2018-09-02 16:52:52