简体   繁体   English

组聚合 PySpark 中的算术减法

[英]Arithmetic subtraction in group aggregation PySpark

I have the following dataframe:我有以下数据框:

ID val1 val2 val3 ...
1   4    1    3   ...
1   5    4    8   ...
2   6    3    6   ...
2   9    2    2   ...
3   2    1    4   ...
3   1    1    4   ...

I need to group/aggregate by ID and subtract the values, producing the following output:我需要按 ID 分组/聚合并减去值,产生以下输出:

ID val1 val2 val3 ...
1   -1   -3  -5   ... 
2   -3    1   4   ...
3    1    0   0   ...

My current approach would produce the desired output for 1 column at a time:我目前的方法会一次为 1 列生成所需的输出:

from pyspark.sql.functions import first, last
output = df.groupBy('id').agg(first('val1') - (last(col('val1'))))

However, my data set has numerous columns and I would need to find a clean way to do it for all columns.但是,我的数据集有很多列,我需要为所有列找到一种干净的方法。

Check below code.检查下面的代码。

df
.groupBy(col("id"))
.agg(
    (first(col("val1")) - last(col("val1"))).as("val1"),
    (first(col("val2")) - last(col("val2"))).as("val2"),
    (first(col("val3")) - last(col("val3"))).as("val3")
)
.orderBy(col("id"), ascending=True)
.show(false)
+---+----+----+----+
|id |val1|val2|val3|
+---+----+----+----+
|1  |-1  |-3  |-5  |
|2  |-3  |1   |4   |
|3  |1   |0   |0   |
+---+----+----+----+
aggCols = map(lambda c: (first(col(c)) - last(col(c))).alias(c),filter(lambda c: c != "id", df.columns))
df.groupBy(col("id")).agg(*aggCols).show()
+---+----+----+----+
| id|val1|val2|val3|
+---+----+----+----+
|  1|  -1|  -3|  -5|
|  3|   1|   0|   0|
|  2|  -3|   1|   4|
+---+----+----+----+

Lets register a udf and use numpy's .ptp让我们注册一个 udf 并使用 numpy 的 .ptp

from pyspark.sql import Window
import pyspark.sql.functions as F

c = udf(lambda x: float(np.ptp(x)), FloatType())#register udf
df.groupBy('id').agg(c(F.collect_list('val1')).alias('v1'),c(F.collect_list('val2')).alias('v2'),c(F.collect_list('val3')).alias('v3')).show()#Apply udf

+---+---+---+---+
| id| v1| v2| v3|
+---+---+---+---+
|  1|1.0|3.0|5.0|
|  2|3.0|1.0|4.0|
|  3|1.0|0.0|0.0|
+---+---+---+---+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM