[英]How to generate cumulative concatenation in Spark SQL
My Input for spark is below:我的火花输入如下:
Col_1 ![]() |
Col_2 ![]() |
Amount![]() |
---|---|---|
1 ![]() |
0 ![]() |
35/310320 ![]() |
1 ![]() |
1 ![]() |
35/5 ![]() |
1 ![]() |
1 ![]() |
180/-310350 ![]() |
17 ![]() |
1 ![]() |
0/1000 ![]() |
17 ![]() |
17 ![]() |
0/-1000 ![]() |
17 ![]() |
17 ![]() |
74/314322 ![]() |
17 ![]() |
17 ![]() |
74/5 ![]() |
17 ![]() |
17 ![]() |
185/-3142 ![]() |
I want to generate the below Output using spark SQL:我想使用火花 SQL 生成以下 Output:
Output ![]() |
---|
35/310320 ![]() |
35/310320/35/5 ![]() |
35/310320/35/5/180/-310350 ![]() |
0/1000 ![]() |
0/1000/0/-1000 ![]() |
0/1000/0/-1000/74/314322 ![]() |
0/1000/0/-1000/74/314322/74/5 ![]() |
0/1000/0/-1000/74/314322/74/5/185/-3142 ![]() |
Conditions & Procedure: If col_1
and col_2
values are not the same then consider the current amount value for the new Output column but both are the same then concatenate the previous all amount value by /
.条件和程序:如果
col_1
和col_2
值不同,则考虑新 Output 列的当前金额值,但两者相同,然后将之前的所有金额值连接/
。
ie 17 from col_1
where col_1
& col_2
value are different so consider current amount 0/1000
.即
col_1
中的 17,其中col_1
和col_2
值不同,因此请考虑当前金额0/1000
。 Next step both column values is the same so the value is 0/1000/0/-1000
and so on.下一步,两个列的值相同,因此值为
0/1000/0/-1000
,依此类推。 Need to create this logic for dynamic data in spark SQL or Spark Scala.需要为 spark SQL 或 Spark Scala 中的动态数据创建此逻辑。
You can use concat_ws
on a list of amount obtained from collect_list
over an appropriate window:您可以在从
collect_list
获得的金额列表上使用concat_ws
,并使用适当的 window:
import org.apache.spark.sql.expressions.Window
val df2 = df.withColumn(
"output",
concat_ws(
"/",
collect_list("amount").over(
Window.partitionBy("col_1")
.orderBy("col_2")
.rowsBetween(Window.unboundedPreceding, 0)
)
)
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.