根据条件火花替换值

Question

I have dataset I want to replace the result column based on the least value of quantity by grouping id,date我有数据集我想通过分组 id，date 根据数量的最小值替换结果列

id,date,quantity,result
1,2016-01-01,245,1
1,2016-01-01,345,3
1,2016-01-01,123,2
1,2016-01-02,120,5
2,2016-01-01,567,1
2,2016-01-01,568,1
2,2016-01-02,453,1

Here the output, replace the quantity which has least value in that groupby(id,date).这里的 output，用（id，日期）替换该组中值最小的数量。 Here ordering of rows doesn't matter, any order it can be.这里行的顺序无关紧要，可以是任何顺序。

id,date,quantity,result
1,2016-01-01,245,2
1,2016-01-01,345,2
1,2016-01-01,123,2
1,2016-01-02,120,5
2,2016-01-01,567,1
2,2016-01-01,568,1
2,2016-01-02,453,1

Answer 1

Use the Window and get the maximum by max .使用Window并通过max获得最大值。

import pyspark.sql.functions as f
from pyspark.sql import Window

w = Window.partitionBy('id', 'date')

df.withColumn('result', f.when(f.col('quantity') == f.min('quantity').over(w), f.col('result'))) \
  .withColumn('result', f.max('result').over(w)).show(10, False)

+---+----------+--------+------+
|id |date      |quantity|result|
+---+----------+--------+------+
|1  |2016-01-02|120     |5     |
|1  |2016-01-01|245     |2     |
|1  |2016-01-01|345     |2     |
|1  |2016-01-01|123     |2     |
|2  |2016-01-02|453     |1     |
|2  |2016-01-01|567     |1     |
|2  |2016-01-01|568     |1     |
+---+----------+--------+------+

根据条件火花替换值

问题描述

1 个解决方案

解决方案1
0 2020-08-20 06:39:28

根据条件火花替换值

问题描述

1 个解决方案

解决方案1 0 2020-08-20 06:39:28

解决方案1
0 2020-08-20 06:39:28