如何在 Spark SQL 中生成累积串联

Question

My Input for spark is below:我的火花输入如下：

Col_1 Col_1	Col_2 Col_2	Amount数量
1 1	0 0	35/310320 35/310320
1 1	1 1	35/5 35/5
1 1	1 1	180/-310350 180/-310350
17 17	1 1	0/1000 0/1000
17 17	17 17	0/-1000 0/-1000
17 17	17 17	74/314322 74/314322
17 17	17 17	74/5 74/5
17 17	17 17	185/-3142 185/-3142

I want to generate the below Output using spark SQL:我想使用火花 SQL 生成以下 Output：

Output Output
35/310320 35/310320
35/310320/35/5 35/310320/35/5
35/310320/35/5/180/-310350 35/310320/35/5/180/-310350
0/1000 0/1000
0/1000/0/-1000 0/1000/0/-1000
0/1000/0/-1000/74/314322 0/1000/0/-1000/74/314322
0/1000/0/-1000/74/314322/74/5 0/1000/0/-1000/74/314322/74/5
0/1000/0/-1000/74/314322/74/5/185/-3142 0/1000/0/-1000/74/314322/74/5/185/-3142

Conditions & Procedure: If col_1 and col_2 values are not the same then consider the current amount value for the new Output column but both are the same then concatenate the previous all amount value by / .条件和程序：如果col_1和col_2值不同，则考虑新 Output 列的当前金额值，但两者相同，然后将之前的所有金额值连接/ 。

ie 17 from col_1 where col_1 & col_2 value are different so consider current amount 0/1000 .即col_1中的 17，其中col_1和col_2值不同，因此请考虑当前金额0/1000 。 Next step both column values is the same so the value is 0/1000/0/-1000 and so on.下一步，两个列的值相同，因此值为0/1000/0/-1000 ，依此类推。 Need to create this logic for dynamic data in spark SQL or Spark Scala.需要为 spark SQL 或 Spark Scala 中的动态数据创建此逻辑。

Answer 1

You can use concat_ws on a list of amount obtained from collect_list over an appropriate window:您可以在从collect_list获得的金额列表上使用concat_ws ，并使用适当的 window：

import org.apache.spark.sql.expressions.Window

val df2 = df.withColumn(
    "output", 
    concat_ws(
        "/", 
        collect_list("amount").over(
            Window.partitionBy("col_1")
                  .orderBy("col_2")
                  .rowsBetween(Window.unboundedPreceding, 0)
        )
    )
)

如何在 Spark SQL 中生成累积串联

问题描述

1 个解决方案

解决方案1
1 2021-03-22 18:20:19

如何在 Spark SQL 中生成累积串联

问题描述

1 个解决方案

解决方案1 1 2021-03-22 18:20:19

解决方案1
1 2021-03-22 18:20:19