[英]How to add a column to the existing DataFrame and using window function to add specific rows in the new column using Scala/Spark 2.2
例如:我想按日期加上售出的数量。
Date Quantity
11/4/2017 20
11/4/2017 23
11/4/2017 12
11/5/2017 18
11/5/2017 12
输出带有新列:
Date Quantity, New_Column
11/4/2017 20 55
11/4/2017 23 55
11/4/2017 12 55
11/5/2017 18 30
11/5/2017 12 30
通过指定WindowSpec,简单地将sum
用作窗口函数:
import org.apache.spark.sql.expressions.Window
df.withColumn("New_Column", sum("Quantity").over(Window.partitionBy("Date"))).show
+---------+--------+----------+
| Date|Quantity|New_Column|
+---------+--------+----------+
|11/5/2017| 18| 30|
|11/5/2017| 12| 30|
|11/4/2017| 20| 55|
|11/4/2017| 23| 55|
|11/4/2017| 12| 55|
+---------+--------+----------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.