除 1 以外的所有列的 Spark groupby 总和

Question

I have a dataset with header like this:我有一个 header 的数据集，如下所示：

|State|2020-01-22|2020-01-23|2020-01-24|2020-01-25|2020-01-26|2020-01-27|2020-01-28|

and I am trying to groupBy based on State column and the sum of row values for each column(The number of columns remains the same).我正在尝试基于groupBy列和每列的行值总和（列数保持不变）进行State 。 But when I do it using:但是当我使用它时：

from pyspark.sql import SparkSession
import pyspark.sql.functions as F
df = df.groupBy('State').agg(F.sum())

But I get the error: sum() missing 1 required positional argument: 'col' How do I get the sum of row values for each column.但我收到错误： sum() missing 1 required positional argument: 'col'如何获取每列的行值总和。 I also tried this:我也试过这个：

df = df.groupBy('State').agg(F.sum('2020-01-22','2020-01-23'))

and I get an error: sum() takes 1 positional argument but 2 were given我得到一个错误： sum() takes 1 positional argument but 2 were given

Thank you for helping me.感谢你们对我的帮助。

Answer 1

Use list comprehension to iterate all columns except the grouper使用列表理解来迭代除石斑鱼之外的所有列

 df.groupBy('State').agg(*[sum(i).alias(f"sum_{i}") for i in df.drop('State').columns]).show()

Answer 2

Simply note that the GroupedData object returned by df.groupBy() has a sum method that sums up all columns when passed no arguments:只需注意GroupedData df.groupBy()返回的 GroupedData object 有一个sum方法，当没有通过 arguments 时，它会汇总所有列：

>>> df.show()
+-----+---+---+
|state|  a|  b|
+-----+---+---+
|    a|  5|  5|
|    a|  6|  6|
|    b| 10| 10|
+-----+---+---+

>>> df.groupBy("state").sum().show()
+-----+------+------+
|state|sum(a)|sum(b)|
+-----+------+------+
|    b|    10|    10|
|    a|    11|    11|
+-----+------+------+

除 1 以外的所有列的 Spark groupby 总和

问题描述

2 个解决方案

解决方案1
0 2022-04-19 00:46:09

解决方案2
0 2022-04-19 07:16:43

除 1 以外的所有列的 Spark groupby 总和

问题描述

2 个解决方案

解决方案1 0 2022-04-19 00:46:09

解决方案2 0 2022-04-19 07:16:43

解决方案1
0 2022-04-19 00:46:09

解决方案2
0 2022-04-19 07:16:43