如何将函数应用于Python pandas中的分组数据帧？

Question

I am grouping my dataframe by one of its columns as follows (example with iris dataset): 我正在通过其中一个列对我的数据帧进行分组，如下所示（使用iris数据集的示例）：

grouped_iris = iris.groupby(by="Name")

I would like to apply a function per group that does something specific with a subset of the columns in grouped_iris . 我想为每个组应用一个函数，该函数使用grouped_iris中的列的子集执行特定grouped_iris 。 How could I apply a function that for each group (each value of Name ) sums PetalLength and PetalWidth and puts it in a new column called SumLengthWidth ? 我怎么能应用一个函数，每个组（ Name每个值）总和PetalLength和PetalWidth并将它放在一个名为SumLengthWidth的新列中？ I know that I can sum all the columns per group with agg like this: 我知道，我可以总结每个组中的所有列与agg是这样的：

grouped_iris.agg(sum)

But what I'm looking for is a twist on this: instead of summing all entries of a particular Name for each column, I want to sum just a subset of the columns ( SepalWidth, SepalLength ) for each Name group. 但我正在寻找的是一个扭曲：不是总结每列的特定Name所有条目，我想只为每个Name组的列的一个子集（ SepalWidth, SepalLength ） SepalWidth, SepalLength 。 thanks. 谢谢。

Answer 1

这似乎有点不优雅，但做的工作：

grouped_iris[['PetalLength', 'PetalWidth']].sum().sum(axis=1)

Answer 2

Can't tell if you want the aggregate numbers (in which case Andy's solution is what you want), or if you want it transformed back into the original dataframe. 无法判断您是否需要汇总数字（在这种情况下，Andy的解决方案是您想要的），或者您是否希望将其转换回原始数据帧。 If it's the latter, you can use transform 如果是后者，则可以使用transform

In [33]: cols = ['PetalLength', 'PetalWidth']

In [34]: transformed = grouped_iris[cols].transform(sum).sum(axis=1)

In [35]: iris['SumLengthWidth'] = transformed

In [36]: iris.head()
Out[36]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name  SumLengthWidth
0          5.1         3.5          1.4         0.2  Iris-setosa            85.4
1          4.9         3.0          1.4         0.2  Iris-setosa            85.4
2          4.7         3.2          1.3         0.2  Iris-setosa            85.4
3          4.6         3.1          1.5         0.2  Iris-setosa            85.4
4          5.0         3.6          1.4         0.2  Iris-setosa            85.4

Edit : General case example 编辑：一般案例

In general, for a dataframe df , aggregating the groupby with sum gives you the sum of each group 通常，对于数据帧df ，将groupby与sum聚合可以得到每个组的总和

In [47]: df
Out[47]: 
  Name  val1  val2
0  foo     6     3
1  bar    17     4
2  foo    16     6
3  bar     7     3
4  foo     6    13
5  bar     7     1

In [48]: grouped = df.groupby('Name')

In [49]: grouped.agg(sum)
Out[49]: 
      val1  val2
Name            
bar     31     8
foo     28    22

In your case, you're interested in summing these across the rows: 在您的情况下，您有兴趣跨行汇总这些：

In [50]: grouped.agg(sum).sum(axis=1)
Out[50]: 
Name
bar     39
foo     50

But that only gives you 2 numbers; 但那只能给你2个数字; 1 for each group. 每组1个。 In general, if you want those two numbers projected back onto the original dataframe, you want to use transform : 通常，如果您希望将这两个数字投射回原始数据帧，则需要使用transform ：

In [51]: grouped.transform(sum)
Out[51]: 
   val1  val2
0    28    22
1    31     8
2    28    22
3    31     8
4    28    22
5    31     8

Notice how these values are the exact same as the values produced by agg , but that it has the same dimensions as the original df . 请注意这些值与agg生成的值完全相同，但它与原始df具有相同的尺寸。 Notice also how every other value is repeated, since rows [0, 2, 4] and [1, 3, 5] are the same groups. 另请注意每个其他值是如何重复的，因为行[0,2,4]和[1,3,5]是相同的组。 In your case, you want the sum of the two values, so you'd sum this across the rows. 在您的情况下，您需要两个值的总和，因此您可以在行之间对此求和。

In [52]: grouped.transform(sum).sum(axis=1)
Out[52]: 
0    50
1    39
2    50
3    39
4    50
5    39

You now have a series that's the same length as the original dataframe, so you can assign it back as a column (or do what you like with it): 您现在有一个与原始数据帧长度相同的系列，因此您可以将其作为列分配（或使用它执行您喜欢的操作）：

In [53]: df['val1 + val2 by Name'] = grouped.transform(sum).sum(axis=1)

In [54]: df
Out[54]: 
  Name  val1  val2  val1 + val2 by Name
0  foo     6     3                   50
1  bar    17     4                   39
2  foo    16     6                   50
3  bar     7     3                   39
4  foo     6    13                   50
5  bar     7     1                   39

如何将函数应用于Python pandas中的分组数据帧？

问题描述

2 个解决方案

解决方案1
2 2013-02-24 17:46:46

解决方案2
2 已采纳 2013-02-24 17:55:52

如何将函数应用于Python pandas中的分组数据帧？

问题描述

2 个解决方案

解决方案1 2 2013-02-24 17:46:46

解决方案2 2 已采纳 2013-02-24 17:55:52

解决方案1
2 2013-02-24 17:46:46

解决方案2
2 已采纳 2013-02-24 17:55:52