I have a sql query as such:
WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [date] ORDER BY TradedVolumSum DESC) AS rn
FROM tempTrades
)
SELECT *
FROM cte
WHERE rn = 1
and I want to use it in spark sql to query my dataframe.
and I want to have only the maximum of tradedVolumSum for each day with the SecurityDescription. so I want to see something like:
how would I simulate same behaviour in spark sql in python?
Thanks!
Below is the code for your problem assuming your data frame name is tempTrades:
import pyspark.sql.functions as F
from pyspark.sql import Window
win_temp = Window.partitionBy(F.col("[date]")).orderBy(F.col("TradedVolumSum").desc())
tempTrades.withColumn(
"rn",
F.row_number().over(win_temp)
).filter(
F.col("rn") == 1
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.