简体   繁体   English

根据条件火花替换值

[英]Replace the values based on condition spark

I have dataset I want to replace the result column based on the least value of quantity by grouping id,date我有数据集我想通过分组 id,date 根据数量的最小值替换结果列

id,date,quantity,result
1,2016-01-01,245,1
1,2016-01-01,345,3
1,2016-01-01,123,2
1,2016-01-02,120,5
2,2016-01-01,567,1
2,2016-01-01,568,1
2,2016-01-02,453,1

Here the output, replace the quantity which has least value in that groupby(id,date).这里的 output,用(id,日期)替换该组中值最小的数量。 Here ordering of rows doesn't matter, any order it can be.这里行的顺序无关紧要,可以是任何顺序。

id,date,quantity,result
1,2016-01-01,245,2
1,2016-01-01,345,2
1,2016-01-01,123,2
1,2016-01-02,120,5
2,2016-01-01,567,1
2,2016-01-01,568,1
2,2016-01-02,453,1

Use the Window and get the maximum by max .使用Window并通过max获得最大值。

import pyspark.sql.functions as f
from pyspark.sql import Window

w = Window.partitionBy('id', 'date')

df.withColumn('result', f.when(f.col('quantity') == f.min('quantity').over(w), f.col('result'))) \
  .withColumn('result', f.max('result').over(w)).show(10, False)

+---+----------+--------+------+
|id |date      |quantity|result|
+---+----------+--------+------+
|1  |2016-01-02|120     |5     |
|1  |2016-01-01|245     |2     |
|1  |2016-01-01|345     |2     |
|1  |2016-01-01|123     |2     |
|2  |2016-01-02|453     |1     |
|2  |2016-01-01|567     |1     |
|2  |2016-01-01|568     |1     |
+---+----------+--------+------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM