繁体   English   中英

根据上一行的旧值在列上执行火花计算

[英]spark do calculation on column based on old values of previous row

我有一个给定的数据框架如下

+----------------+---------------+----------+------------------+-------------+
|Transaction_date|    Added  date|coupon_id |cart_value        | coupon_value|
+----------------+---------------+----------+------------------+-------------+
|2018-01-16      |2018-02-01     |2390324796|12.5              |1.8          |
|2018-01-16      |2018-01-04     |1100111212|1.0               |2.0          |
|2018-01-19      |2018-01-04     |1100111212|2.5               |2.0          |
+----------------+---------------+----------+------------------+-------------+

我需要将优惠券价值应用于购物车价值,并更新优惠券余额和自动兑换的价值,这仅在“ Transaction_date”大于优惠券价值的“ Added Date”时才要执行

逻辑

UpdatedBalance =(coupon_value-cart_value),如果cart_value更大,则仅兑换可用的优惠券值。

已兑换=在给定交易中兑换了多少

我想要这样的东西

+----------------+---------------+----------+------------------+-------------+-------------+-------------+
|Transaction_date|    Added  date|coupon_id |cart_value        | coupon_value|UpdatedBalance|Redeemed      |
+----------------+---------------+----------+------------------+-------------+-------------+-------------+
|2018-01-16      |2018-02-01     |2390324796|12.5              |1.8          |0            |0            |
|2018-01-16      |2018-01-04     |1100111212|1.0               |2.0          |1            |1            |
|2018-01-19      |2018-01-04     |1100111212|2.5               |2.0          |0            |1            |
+----------------+---------------+----------+------------------+-------------+-------------+-------------+

我正在尝试以火花斯卡拉

假设分区位于整个表上,并且按additioned_date降序排列,则下面的方法可以工作

scala> val df =Seq(("2018-01-16","2018-02-01",2390324796L,12.5,1.8),("2018-01-16","2018-01-04",1100111212L,1.0,2.0),("2018-01-19","2018-01-04",1100111212L,2.5,2.0)).toDF("Transaction_date","Added_date","coupon_id","cart_value","coupon_value")
df: org.apache.spark.sql.DataFrame = [Transaction_date: string, Added_date: string ... 3 more fields]

scala> df.show(false)
+----------------+----------+----------+----------+------------+
|Transaction_date|Added_date|coupon_id |cart_value|coupon_value|
+----------------+----------+----------+----------+------------+
|2018-01-16      |2018-02-01|2390324796|12.5      |1.8         |
|2018-01-16      |2018-01-04|1100111212|1.0       |2.0         |
|2018-01-19      |2018-01-04|1100111212|2.5       |2.0         |
+----------------+----------+----------+----------+------------+


scala> val df2 = df.withColumn("UpdatedBalance",when('coupon_value>'cart_value,'coupon_value-'cart_value).otherwise(0))
df2: org.apache.spark.sql.DataFrame = [Transaction_date: string, Added_date: string ... 4 more fields]

scala> df2.show(false)
+----------------+----------+----------+----------+------------+--------------+
|Transaction_date|Added_date|coupon_id |cart_value|coupon_value|UpdatedBalance|
+----------------+----------+----------+----------+------------+--------------+
|2018-01-16      |2018-02-01|2390324796|12.5      |1.8         |0.0           |
|2018-01-16      |2018-01-04|1100111212|1.0       |2.0         |1.0           |
|2018-01-19      |2018-01-04|1100111212|2.5       |2.0         |0.0           |
+----------------+----------+----------+----------+------------+--------------+

scala> import org.apache.spark.sql.expressions._
import org.apache.spark.sql.expressions._


scala> df2.withColumn("Redeemed",sum('UpdatedBalance).over(Window.orderBy('Added_date.desc))).show(false)
19/01/03 10:31:50 WARN window.WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation.
+----------------+----------+----------+----------+------------+--------------+--------+
|Transaction_date|Added_date|coupon_id |cart_value|coupon_value|UpdatedBalance|Redeemed|
+----------------+----------+----------+----------+------------+--------------+--------+
|2018-01-16      |2018-02-01|2390324796|12.5      |1.8         |0.0           |0.0     |
|2018-01-16      |2018-01-04|1100111212|1.0       |2.0         |1.0           |1.0     |
|2018-01-19      |2018-01-04|1100111212|2.5       |2.0         |0.0           |1.0     |
+----------------+----------+----------+----------+------------+--------------+--------+


scala>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM