I am grouping my dataframe by one of its columns as follows (example with iris
dataset):
grouped_iris = iris.groupby(by="Name")
I would like to apply a function per group that does something specific with a subset of the columns in grouped_iris
. How could I apply a function that for each group (each value of Name
) sums PetalLength
and PetalWidth
and puts it in a new column called SumLengthWidth
? I know that I can sum all the columns per group with agg
like this:
grouped_iris.agg(sum)
But what I'm looking for is a twist on this: instead of summing all entries of a particular Name
for each column, I want to sum just a subset of the columns ( SepalWidth, SepalLength
) for each Name
group. thanks.
这似乎有点不优雅,但做的工作:
grouped_iris[['PetalLength', 'PetalWidth']].sum().sum(axis=1)
Can't tell if you want the aggregate numbers (in which case Andy's solution is what you want), or if you want it transformed back into the original dataframe. If it's the latter, you can use transform
In [33]: cols = ['PetalLength', 'PetalWidth']
In [34]: transformed = grouped_iris[cols].transform(sum).sum(axis=1)
In [35]: iris['SumLengthWidth'] = transformed
In [36]: iris.head()
Out[36]:
SepalLength SepalWidth PetalLength PetalWidth Name SumLengthWidth
0 5.1 3.5 1.4 0.2 Iris-setosa 85.4
1 4.9 3.0 1.4 0.2 Iris-setosa 85.4
2 4.7 3.2 1.3 0.2 Iris-setosa 85.4
3 4.6 3.1 1.5 0.2 Iris-setosa 85.4
4 5.0 3.6 1.4 0.2 Iris-setosa 85.4
Edit : General case example
In general, for a dataframe df
, aggregating the groupby with sum
gives you the sum of each group
In [47]: df
Out[47]:
Name val1 val2
0 foo 6 3
1 bar 17 4
2 foo 16 6
3 bar 7 3
4 foo 6 13
5 bar 7 1
In [48]: grouped = df.groupby('Name')
In [49]: grouped.agg(sum)
Out[49]:
val1 val2
Name
bar 31 8
foo 28 22
In your case, you're interested in summing these across the rows:
In [50]: grouped.agg(sum).sum(axis=1)
Out[50]:
Name
bar 39
foo 50
But that only gives you 2 numbers; 1 for each group. In general, if you want those two numbers projected back onto the original dataframe, you want to use transform
:
In [51]: grouped.transform(sum)
Out[51]:
val1 val2
0 28 22
1 31 8
2 28 22
3 31 8
4 28 22
5 31 8
Notice how these values are the exact same as the values produced by agg
, but that it has the same dimensions as the original df
. Notice also how every other value is repeated, since rows [0, 2, 4] and [1, 3, 5] are the same groups. In your case, you want the sum of the two values, so you'd sum this across the rows.
In [52]: grouped.transform(sum).sum(axis=1)
Out[52]:
0 50
1 39
2 50
3 39
4 50
5 39
You now have a series that's the same length as the original dataframe, so you can assign it back as a column (or do what you like with it):
In [53]: df['val1 + val2 by Name'] = grouped.transform(sum).sum(axis=1)
In [54]: df
Out[54]:
Name val1 val2 val1 + val2 by Name
0 foo 6 3 50
1 bar 17 4 39
2 foo 16 6 50
3 bar 7 3 39
4 foo 6 13 50
5 bar 7 1 39
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.