简体   繁体   English

根据条件取 sum ini spark-scala

[英]Taking sum ini spark-scala based on a condition

I have a data frame like this.我有一个这样的数据框。 How can i take the sum of the column sales where the rank is greater than 3 , per 'M'我如何计算排名大于 3 的列销售额的总和,每个“M”

+---+-----+----+
|  M|Sales|Rank|
+---+-----+----+
| M1|  200|   1|
| M1|  175|   2|
| M1|  150|   3|
| M1|  125|   4|
| M1|   90|   5|
| M1|   85|   6|
| M2| 1001|   1|
| M2|  500|   2|
| M2|  456|   3|
| M2|  345|   4|
| M2|  231|   5|
| M2|  123|   6|
+---+-----+----+

Expected Output --预期产出——

+---+-----+----+---------------+
|  M|Sales|Rank|SumGreaterThan3|
+---+-----+----+---------------+
| M1|  200|   1|            300|
| M1|  175|   2|            300|
| M1|  150|   3|            300|
| M1|  125|   4|            300|
| M1|   90|   5|            300|
| M1|   85|   6|            300|
| M2| 1001|   1|            699|
| M2|  500|   2|            699|
| M2|  456|   3|            699|
| M2|  345|   4|            699|
| M2|  231|   5|            699|
| M2|  123|   6|            699|
+---+-----+----+---------------+

I have done sum over ROwnumber like this我已经像这样完成了对 ROwnumber 的求和

df.withColumn("SumGreaterThan3",sum("Sales").over(Window.partitionBy(col("M"))))` //But this will provide total sum of sales.

To replicate the same DF-复制相同的 DF-

val df = Seq(
("M1",200,1),
("M1",175,2),
("M1",150,3),
("M1",125,4),
("M1",90,5),
("M1",85,6),
("M2",1001,1),
("M2",500,2),
("M2",456,3),
("M2",345,4),
("M2",231,5),
("M2",123,6)
).toDF("M","Sales","Rank")

Well, the partition is enough to set the window function.好吧,分区足以设置window函数。 Of course you also have to use the conditional summation by mixing sum and when .当然,您还必须通过混合sumwhen来使用条件求和。

import org.apache.spark.sql.expressions.Window
val w = Window.partitionBy("M")
df.withColumn("SumGreaterThan3", sum(when('Rank > 3, 'Sales).otherwise(0)).over(w).alias("sum")).show

This will givs you the expected results.这将为您提供预期的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM