简体繁体中英

Migrating window functions from SQL to spark scala

原文 2020-07-24 01:20:35 1 2 sql/ scala/ apache-spark-sql

Here's some SQL expression that I'm trying to migrate to spark scala.

SELECT
 a.senderId,
 b.company_id,
 ROW_NUMBER() OVER(PARTITION BY a.senderId ORDER BY b.chron_rank) AS rnk
FROM df1 a
JOIN df2 b
ON a.senderId = b.member_id
WHERE a.datepartition BETWEEN concat(b.start_date,'-00') AND concat(b.end_date,'-00')

I'm a little lost with the window function, I started something like this,

val temp = df2.join(df1, $"dimPosition.member_id" === $"df1.senderId")
    .select($"df1.senderId", $"df2.company_id")
    .......

2 answers

Try this-

df2.as("b")
      .join(df1.as("a"), $"a.senderId" === $"b.member_id" && $"a.datepartition".between(
        concat($"b.start_date",lit("-00")), concat($"b.end_date", lit("-00")))
      )
      .selectExpr("a.senderId",
        "b.company_id",
        "ROW_NUMBER() OVER(PARTITION BY a.senderId ORDER BY b.chron_rank) AS rnk")

Try this.. may be you will face issue for where clause..

val temp = df2.join(df1, $"dimPosition.member_id" === $"df1.senderId")
  .select($"df1.senderId", $"df2.company_id")
  .withColumn('rnk', ROW_NUMBER() OVER Window.partitionBy("senderId",")
  .orderBy("chron_rank"))
  .where(datepartition BETWEEN concat(b.start_date,'-00') AND concat(b.end_date,'-00'))

How to fix the error mismatched input 'partition' for window functions in spark sql?

Performing Specific Filter Using Window Functions in SQL/Spark

Spark-SQL Window functions on Dataframe - Finding first timestamp in a group

Not able to COUNT DISTINCT using WINDOW functions (Spark SQL)

A regex in SQL or Spark (Scala)

SQL queries in Spark/scala

Migrating from sql to mysql

Count(*) equivalent for Spark SQL in Scala

writing a UDF in spark sql with scala

Merging Scala Spark sql schemas

暂无

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to fix the error mismatched input 'partition' for window functions in spark sql? Performing Specific Filter Using Window Functions in SQL/Spark Spark-SQL Window functions on Dataframe - Finding first timestamp in a group Not able to COUNT DISTINCT using WINDOW functions (Spark SQL) A regex in SQL or Spark (Scala) SQL queries in Spark/scala Migrating from sql to mysql Count(*) equivalent for Spark SQL in Scala writing a UDF in spark sql with scala Merging Scala Spark sql schemas

Related Tags

粤ICP备18138465号 © 2020-2024 STACKOOM.COM