[英]how to apply functions to grouped dataframes in Python pandas?
I am grouping my dataframe by one of its columns as follows (example with iris
dataset): 我正在通过其中一个列对我的数据帧进行分组,如下所示(使用iris
数据集的示例):
grouped_iris = iris.groupby(by="Name")
I would like to apply a function per group that does something specific with a subset of the columns in grouped_iris
. 我想为每个组应用一个函数,该函数使用grouped_iris
中的列的子集执行特定grouped_iris
。 How could I apply a function that for each group (each value of Name
) sums PetalLength
and PetalWidth
and puts it in a new column called SumLengthWidth
? 我怎么能应用一个函数,每个组( Name
每个值)总和PetalLength
和PetalWidth
并将它放在一个名为SumLengthWidth
的新列中? I know that I can sum all the columns per group with agg
like this: 我知道,我可以总结每个组中的所有列与agg
是这样的:
grouped_iris.agg(sum)
But what I'm looking for is a twist on this: instead of summing all entries of a particular Name
for each column, I want to sum just a subset of the columns ( SepalWidth, SepalLength
) for each Name
group. 但我正在寻找的是一个扭曲:不是总结每列的特定Name
所有条目,我想只为每个Name
组的列的一个子集( SepalWidth, SepalLength
) SepalWidth, SepalLength
。 thanks. 谢谢。
这似乎有点不优雅,但做的工作:
grouped_iris[['PetalLength', 'PetalWidth']].sum().sum(axis=1)
Can't tell if you want the aggregate numbers (in which case Andy's solution is what you want), or if you want it transformed back into the original dataframe. 无法判断您是否需要汇总数字(在这种情况下,Andy的解决方案是您想要的),或者您是否希望将其转换回原始数据帧。 If it's the latter, you can use transform
如果是后者,则可以使用transform
In [33]: cols = ['PetalLength', 'PetalWidth']
In [34]: transformed = grouped_iris[cols].transform(sum).sum(axis=1)
In [35]: iris['SumLengthWidth'] = transformed
In [36]: iris.head()
Out[36]:
SepalLength SepalWidth PetalLength PetalWidth Name SumLengthWidth
0 5.1 3.5 1.4 0.2 Iris-setosa 85.4
1 4.9 3.0 1.4 0.2 Iris-setosa 85.4
2 4.7 3.2 1.3 0.2 Iris-setosa 85.4
3 4.6 3.1 1.5 0.2 Iris-setosa 85.4
4 5.0 3.6 1.4 0.2 Iris-setosa 85.4
Edit : General case example 编辑 :一般案例
In general, for a dataframe df
, aggregating the groupby with sum
gives you the sum of each group 通常,对于数据帧df
,将groupby与sum
聚合可以得到每个组的总和
In [47]: df
Out[47]:
Name val1 val2
0 foo 6 3
1 bar 17 4
2 foo 16 6
3 bar 7 3
4 foo 6 13
5 bar 7 1
In [48]: grouped = df.groupby('Name')
In [49]: grouped.agg(sum)
Out[49]:
val1 val2
Name
bar 31 8
foo 28 22
In your case, you're interested in summing these across the rows: 在您的情况下,您有兴趣跨行汇总这些:
In [50]: grouped.agg(sum).sum(axis=1)
Out[50]:
Name
bar 39
foo 50
But that only gives you 2 numbers; 但那只能给你2个数字; 1 for each group. 每组1个。 In general, if you want those two numbers projected back onto the original dataframe, you want to use transform
: 通常,如果您希望将这两个数字投射回原始数据帧,则需要使用transform
:
In [51]: grouped.transform(sum)
Out[51]:
val1 val2
0 28 22
1 31 8
2 28 22
3 31 8
4 28 22
5 31 8
Notice how these values are the exact same as the values produced by agg
, but that it has the same dimensions as the original df
. 请注意这些值与agg
生成的值完全相同, 但它与原始df
具有相同的尺寸。 Notice also how every other value is repeated, since rows [0, 2, 4] and [1, 3, 5] are the same groups. 另请注意每个其他值是如何重复的,因为行[0,2,4]和[1,3,5]是相同的组。 In your case, you want the sum of the two values, so you'd sum this across the rows. 在您的情况下,您需要两个值的总和,因此您可以在行之间对此求和。
In [52]: grouped.transform(sum).sum(axis=1)
Out[52]:
0 50
1 39
2 50
3 39
4 50
5 39
You now have a series that's the same length as the original dataframe, so you can assign it back as a column (or do what you like with it): 您现在有一个与原始数据帧长度相同的系列,因此您可以将其作为列分配(或使用它执行您喜欢的操作):
In [53]: df['val1 + val2 by Name'] = grouped.transform(sum).sum(axis=1)
In [54]: df
Out[54]:
Name val1 val2 val1 + val2 by Name
0 foo 6 3 50
1 bar 17 4 39
2 foo 16 6 50
3 bar 7 3 39
4 foo 6 13 50
5 bar 7 1 39
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.