简体   繁体   English

如何在分组的DataFrame上强制pandas.DataFrame.apply

[英]How to force pandas.DataFrame.apply on a grouped DataFrame

The behavior of pandas.DataFrame.apply(myfunc) is application of myfunc along columns. pandas.DataFrame.apply(myfunc)的行为是沿列应用myfunc The behavior of pandas.core.groupby.DataFrameGroupBy.apply is more complicated. pandas.core.groupby.DataFrameGroupBy.apply的行为更为复杂。 This difference shows up for functions myfunc such that frame.apply(myfunc) != myfunc(frame) . 这种差异在函数myfunc显示出来,例如frame.apply(myfunc) != myfunc(frame)

I would like to group a DataFrame then apply myfunc along columns of each individual frame (in each group) and then paste together the results. 我想对一个DataFrame进行分组,然后将myfunc沿每个框架(在每个组中)的列应用,然后将结果粘贴在一起。 There are hacky ways to do it, but I wonder it seems like there is some simple kwarg I'm missing. 有很多方法可以做到,但我想知道我似乎缺少一些简单的怪癖。

Consider the example below: 考虑下面的示例:

In [22]: df = pd.DataFrame({'a':range(5), 'b': range(5, 10)})

In [23]: df
Out[23]: 
   a  b
0  0  5
1  1  6
2  2  7
3  3  8
4  4  9

In [24]: def myfunc(data):
             # Implements max in a funny way.
             # However, this is just an example of a function such that 
             # myfunc(frame) != frame.apply(myfunc)
             return data.values.ravel().max()

In [25]: df.apply(myfunc)
Out[25]: 
a    4
b    9

In [26]: df.groupby(df.a < 2).apply(myfunc)
Out[26]: 
a
False    9
True     6

As you can see, myfunc was called like myfunc(group) . 如您所见, myfunc调用方式类似于myfunc(group) This default behavior is reasonable, since myfunc takes in a DataFrame and returns a number, but this is not what I always want. 此默认行为是合理的,因为myfunc接受一个DataFrame并返回一个数字,但这不是我一直想要的。 Is there a canonical way to force myfunc to be applied along columns of each group, as in group.apply(myfunc) ? 有没有一种规范的方法可以强制将myfunc沿每个组的列应用,例如group.apply(myfunc) The best I can come up with is an awkward wrapper: 我能想到的最好的东西是一个笨拙的包装器:

In [27]: def wrapped(frame):
   ....:     return frame.apply(myfunc)

In [28]: df.groupby(df.a < 2).apply(wrapped)
Out[28]: 
       a  b
a          
False  4  9
True   1  6

You can do this 你可以这样做

In [25]: df.groupby(df.a<2).aggregate(myfunc)
Out[25]: 
       a  b
a          
False  4  9
True   1  6

[2 rows x 2 columns]

But this is simpler 但这比较简单

In [26]: df.groupby(df.a<2).max()
Out[26]: 
       a  b
a          
False  4  9
True   1  6

[2 rows x 2 columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM