[英]sql With partition use in spark sql dataframe query
I have a sql query as such:我有一个这样的sql查询:
WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY [date] ORDER BY TradedVolumSum DESC) AS rn
FROM tempTrades
)
SELECT *
FROM cte
WHERE rn = 1
and I want to use it in spark sql to query my dataframe.我想在 spark sql 中使用它来查询我的数据框。
my dataframe looks like:我的数据框看起来像:
and I want to have only the maximum of tradedVolumSum for each day with the SecurityDescription.并且我只想使用 SecurityDescription 获得每天的最大交易量。 so I want to see something like:
所以我想看到类似的东西:
how would I simulate same behaviour in spark sql in python?我将如何在 python 中的 spark sql 中模拟相同的行为?
Thanks!谢谢!
Below is the code for your problem assuming your data frame name is tempTrades:以下是假设您的数据框名称为 tempTrades 的问题代码:
import pyspark.sql.functions as F
from pyspark.sql import Window
win_temp = Window.partitionBy(F.col("[date]")).orderBy(F.col("TradedVolumSum").desc())
tempTrades.withColumn(
"rn",
F.row_number().over(win_temp)
).filter(
F.col("rn") == 1
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.