[英]How to force pandas.DataFrame.apply on a grouped DataFrame
The behavior of pandas.DataFrame.apply(myfunc)
is application of myfunc
along columns. pandas.DataFrame.apply(myfunc)
的行为是沿列应用myfunc
。 The behavior of pandas.core.groupby.DataFrameGroupBy.apply
is more complicated. pandas.core.groupby.DataFrameGroupBy.apply
的行为更为复杂。 This difference shows up for functions myfunc
such that frame.apply(myfunc) != myfunc(frame)
. 这种差异在函数myfunc
显示出来,例如frame.apply(myfunc) != myfunc(frame)
。
I would like to group a DataFrame
then apply myfunc
along columns of each individual frame (in each group) and then paste together the results. 我想对一个DataFrame
进行分组,然后将myfunc
沿每个框架(在每个组中)的列应用,然后将结果粘贴在一起。 There are hacky ways to do it, but I wonder it seems like there is some simple kwarg I'm missing. 有很多方法可以做到,但我想知道我似乎缺少一些简单的怪癖。
Consider the example below: 考虑下面的示例:
In [22]: df = pd.DataFrame({'a':range(5), 'b': range(5, 10)})
In [23]: df
Out[23]:
a b
0 0 5
1 1 6
2 2 7
3 3 8
4 4 9
In [24]: def myfunc(data):
# Implements max in a funny way.
# However, this is just an example of a function such that
# myfunc(frame) != frame.apply(myfunc)
return data.values.ravel().max()
In [25]: df.apply(myfunc)
Out[25]:
a 4
b 9
In [26]: df.groupby(df.a < 2).apply(myfunc)
Out[26]:
a
False 9
True 6
As you can see, myfunc
was called like myfunc(group)
. 如您所见, myfunc
调用方式类似于myfunc(group)
。 This default behavior is reasonable, since myfunc
takes in a DataFrame
and returns a number, but this is not what I always want. 此默认行为是合理的,因为myfunc
接受一个DataFrame
并返回一个数字,但这不是我一直想要的。 Is there a canonical way to force myfunc
to be applied along columns of each group, as in group.apply(myfunc)
? 有没有一种规范的方法可以强制将myfunc
沿每个组的列应用,例如group.apply(myfunc)
? The best I can come up with is an awkward wrapper: 我能想到的最好的东西是一个笨拙的包装器:
In [27]: def wrapped(frame):
....: return frame.apply(myfunc)
In [28]: df.groupby(df.a < 2).apply(wrapped)
Out[28]:
a b
a
False 4 9
True 1 6
You can do this 你可以这样做
In [25]: df.groupby(df.a<2).aggregate(myfunc)
Out[25]:
a b
a
False 4 9
True 1 6
[2 rows x 2 columns]
But this is simpler 但这比较简单
In [26]: df.groupby(df.a<2).max()
Out[26]:
a b
a
False 4 9
True 1 6
[2 rows x 2 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.