简体   繁体   中英

how to apply functions to grouped dataframes in Python pandas?

I am grouping my dataframe by one of its columns as follows (example with iris dataset):

grouped_iris = iris.groupby(by="Name")

I would like to apply a function per group that does something specific with a subset of the columns in grouped_iris . How could I apply a function that for each group (each value of Name ) sums PetalLength and PetalWidth and puts it in a new column called SumLengthWidth ? I know that I can sum all the columns per group with agg like this:

grouped_iris.agg(sum)

But what I'm looking for is a twist on this: instead of summing all entries of a particular Name for each column, I want to sum just a subset of the columns ( SepalWidth, SepalLength ) for each Name group. thanks.

这似乎有点不优雅,但做的工作:

grouped_iris[['PetalLength', 'PetalWidth']].sum().sum(axis=1)

Can't tell if you want the aggregate numbers (in which case Andy's solution is what you want), or if you want it transformed back into the original dataframe. If it's the latter, you can use transform

In [33]: cols = ['PetalLength', 'PetalWidth']

In [34]: transformed = grouped_iris[cols].transform(sum).sum(axis=1)

In [35]: iris['SumLengthWidth'] = transformed

In [36]: iris.head()
Out[36]: 
   SepalLength  SepalWidth  PetalLength  PetalWidth         Name  SumLengthWidth
0          5.1         3.5          1.4         0.2  Iris-setosa            85.4
1          4.9         3.0          1.4         0.2  Iris-setosa            85.4
2          4.7         3.2          1.3         0.2  Iris-setosa            85.4
3          4.6         3.1          1.5         0.2  Iris-setosa            85.4
4          5.0         3.6          1.4         0.2  Iris-setosa            85.4

Edit : General case example

In general, for a dataframe df , aggregating the groupby with sum gives you the sum of each group

In [47]: df
Out[47]: 
  Name  val1  val2
0  foo     6     3
1  bar    17     4
2  foo    16     6
3  bar     7     3
4  foo     6    13
5  bar     7     1

In [48]: grouped = df.groupby('Name')

In [49]: grouped.agg(sum)
Out[49]: 
      val1  val2
Name            
bar     31     8
foo     28    22

In your case, you're interested in summing these across the rows:

In [50]: grouped.agg(sum).sum(axis=1)
Out[50]: 
Name
bar     39
foo     50

But that only gives you 2 numbers; 1 for each group. In general, if you want those two numbers projected back onto the original dataframe, you want to use transform :

In [51]: grouped.transform(sum)
Out[51]: 
   val1  val2
0    28    22
1    31     8
2    28    22
3    31     8
4    28    22
5    31     8

Notice how these values are the exact same as the values produced by agg , but that it has the same dimensions as the original df . Notice also how every other value is repeated, since rows [0, 2, 4] and [1, 3, 5] are the same groups. In your case, you want the sum of the two values, so you'd sum this across the rows.

In [52]: grouped.transform(sum).sum(axis=1)
Out[52]: 
0    50
1    39
2    50
3    39
4    50
5    39

You now have a series that's the same length as the original dataframe, so you can assign it back as a column (or do what you like with it):

In [53]: df['val1 + val2 by Name'] = grouped.transform(sum).sum(axis=1)

In [54]: df
Out[54]: 
  Name  val1  val2  val1 + val2 by Name
0  foo     6     3                   50
1  bar    17     4                   39
2  foo    16     6                   50
3  bar     7     3                   39
4  foo     6    13                   50
5  bar     7     1                   39

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM