简体   繁体   中英

How to generate cumulative concatenation in Spark SQL

My Input for spark is below:

Col_1 Col_2 Amount
1 0 35/310320
1 1 35/5
1 1 180/-310350
17 1 0/1000
17 17 0/-1000
17 17 74/314322
17 17 74/5
17 17 185/-3142

I want to generate the below Output using spark SQL:

Output
35/310320
35/310320/35/5
35/310320/35/5/180/-310350
0/1000
0/1000/0/-1000
0/1000/0/-1000/74/314322
0/1000/0/-1000/74/314322/74/5
0/1000/0/-1000/74/314322/74/5/185/-3142

Conditions & Procedure: If col_1 and col_2 values are not the same then consider the current amount value for the new Output column but both are the same then concatenate the previous all amount value by / .

ie 17 from col_1 where col_1 & col_2 value are different so consider current amount 0/1000 . Next step both column values is the same so the value is 0/1000/0/-1000 and so on. Need to create this logic for dynamic data in spark SQL or Spark Scala.

You can use concat_ws on a list of amount obtained from collect_list over an appropriate window:

import org.apache.spark.sql.expressions.Window

val df2 = df.withColumn(
    "output", 
    concat_ws(
        "/", 
        collect_list("amount").over(
            Window.partitionBy("col_1")
                  .orderBy("col_2")
                  .rowsBetween(Window.unboundedPreceding, 0)
        )
    )
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM