[英]How to apply groupBy and aggregate functions to a specific window in a PySpark DataFrame?
I would like to apply a groupBy
and a subsequent agg
function to a PySpark DataFrame, but only to a specific window.我想将
groupBy
和随后的agg
function 应用于 PySpark DataFrame,但仅限于特定的 Z2FBF245B8C35CBD276。 This is best illustrated by an example.这最好用一个例子来说明。 Suppose that I have a dataset named
df
:假设我有一个名为
df
的数据集:
df.show()
+-----+----------+----------+-------+
| ID| Timestamp| Condition| Value|
+-----+----------+----------+-------+
| z1| 1| 0| 50|
|-------------------------------------------|
| | z1| 2| 0| 51| |
| | z1| 3| 0| 52| |
| | z1| 4| 0| 51| |
| | z1| 5| 1| 51| |
| | z1| 6| 0| 49| |
| | z1| 7| 0| 44| |
| | z1| 8| 0| 46| |
|-------------------------------------------|
| z1| 9| 0| 48|
| z1| 10| 0| 42|
+-----+----------+----------+-------+
Particularly, what I would like to do is to apply a kind of window of +- 3 rows to the row where column Condition == 1
(ie in this case, row 5).特别是,我想做的是将一种 +- 3 行的 window 应用于列
Condition == 1
的行(即在这种情况下为第 5 行)。 Within that window, as depicted in the above DataFrame, I would like to find the minimum value of column Value
and the corresponding value of column Timestamp
, thus obtaining:在那个 window 中,如上面的 DataFrame 所示,我想找到列
Value
的最小值和列Timestamp
的对应值,从而得到:
+----------+----------+
| Min_value| Timestamp|
+----------+----------+
| 44| 7|
+----------+----------+
Does anyone know how this can be tackled?有谁知道如何解决这个问题?
Many thanks in advance提前谢谢了
Marioanzas马里安萨斯
You can use a window that spans between 3 preceding and 3 following rows, get the minimum, and filter the condition:您可以使用跨越前 3 行和后 3 行的 window,获取最小值并过滤条件:
from pyspark.sql import functions as F, Window
df2 = df.withColumn(
'min',
F.min(
F.struct('Value', 'Timestamp')
).over(Window.partitionBy('ID').orderBy('Timestamp').rowsBetween(-3,3))
).filter('Condition = 1').select('min.*')
df2.show()
+-----+---------+
|Value|Timestamp|
+-----+---------+
| 44| 7|
+-----+---------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.