繁体   English   中英

将 window 函数从 SQL 迁移到 spark scala

[英]Migrating window functions from SQL to spark scala

这是一些 SQL 表达式,我正在尝试迁移以激发 scala。

SELECT
 a.senderId,
 b.company_id,
 ROW_NUMBER() OVER(PARTITION BY a.senderId ORDER BY b.chron_rank) AS rnk
FROM df1 a
JOIN df2 b
ON a.senderId = b.member_id
WHERE a.datepartition BETWEEN concat(b.start_date,'-00') AND concat(b.end_date,'-00') 

我对 window function 有点迷茫,我开始这样的事情,

val temp = df2.join(df1, $"dimPosition.member_id" === $"df1.senderId")
    .select($"df1.senderId", $"df2.company_id")
    .......

尝试这个-

df2.as("b")
      .join(df1.as("a"), $"a.senderId" === $"b.member_id" && $"a.datepartition".between(
        concat($"b.start_date",lit("-00")), concat($"b.end_date", lit("-00")))
      )
      .selectExpr("a.senderId",
        "b.company_id",
        "ROW_NUMBER() OVER(PARTITION BY a.senderId ORDER BY b.chron_rank) AS rnk")

试试这个..可能你会面临where子句的问题..

val temp = df2.join(df1, $"dimPosition.member_id" === $"df1.senderId")
  .select($"df1.senderId", $"df2.company_id")
  .withColumn('rnk', ROW_NUMBER() OVER Window.partitionBy("senderId",")
  .orderBy("chron_rank"))
  .where(datepartition BETWEEN concat(b.start_date,'-00') AND concat(b.end_date,'-00'))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM