如何在分组的DataFrame上强制pandas.DataFrame.apply

Question

The behavior of pandas.DataFrame.apply(myfunc) is application of myfunc along columns. pandas.DataFrame.apply(myfunc)的行为是沿列应用myfunc 。 The behavior of pandas.core.groupby.DataFrameGroupBy.apply is more complicated. pandas.core.groupby.DataFrameGroupBy.apply的行为更为复杂。 This difference shows up for functions myfunc such that frame.apply(myfunc) != myfunc(frame) . 这种差异在函数myfunc显示出来，例如frame.apply(myfunc) != myfunc(frame) 。

I would like to group a DataFrame then apply myfunc along columns of each individual frame (in each group) and then paste together the results. 我想对一个DataFrame进行分组，然后将myfunc沿每个框架（在每个组中）的列应用，然后将结果粘贴在一起。 There are hacky ways to do it, but I wonder it seems like there is some simple kwarg I'm missing. 有很多方法可以做到，但我想知道我似乎缺少一些简单的怪癖。

Consider the example below: 考虑下面的示例：

In [22]: df = pd.DataFrame({'a':range(5), 'b': range(5, 10)})

In [23]: df
Out[23]: 
   a  b
0  0  5
1  1  6
2  2  7
3  3  8
4  4  9

In [24]: def myfunc(data):
             # Implements max in a funny way.
             # However, this is just an example of a function such that 
             # myfunc(frame) != frame.apply(myfunc)
             return data.values.ravel().max()

In [25]: df.apply(myfunc)
Out[25]: 
a    4
b    9

In [26]: df.groupby(df.a < 2).apply(myfunc)
Out[26]: 
a
False    9
True     6

As you can see, myfunc was called like myfunc(group) . 如您所见， myfunc调用方式类似于myfunc(group) 。 This default behavior is reasonable, since myfunc takes in a DataFrame and returns a number, but this is not what I always want. 此默认行为是合理的，因为myfunc接受一个DataFrame并返回一个数字，但这不是我一直想要的。 Is there a canonical way to force myfunc to be applied along columns of each group, as in group.apply(myfunc) ? 有没有一种规范的方法可以强制将myfunc沿每个组的列应用，例如group.apply(myfunc) ？ The best I can come up with is an awkward wrapper: 我能想到的最好的东西是一个笨拙的包装器：

In [27]: def wrapped(frame):
   ....:     return frame.apply(myfunc)

In [28]: df.groupby(df.a < 2).apply(wrapped)
Out[28]: 
       a  b
a          
False  4  9
True   1  6

Answer 1

You can do this 你可以这样做

In [25]: df.groupby(df.a<2).aggregate(myfunc)
Out[25]: 
       a  b
a          
False  4  9
True   1  6

[2 rows x 2 columns]

But this is simpler 但这比较简单

In [26]: df.groupby(df.a<2).max()
Out[26]: 
       a  b
a          
False  4  9
True   1  6

[2 rows x 2 columns]

如何在分组的DataFrame上强制pandas.DataFrame.apply

问题描述

1 个解决方案

解决方案1
1 2014-05-13 15:54:45

如何在分组的DataFrame上强制pandas.DataFrame.apply

问题描述

1 个解决方案

解决方案1 1 2014-05-13 15:54:45

解决方案1
1 2014-05-13 15:54:45