简体   繁体   English

如何使用groupby将多个函数应用于Pandas中的多个列?

[英]How to use groupby to apply multiple functions to multiple columns in Pandas?

I have a normal df 我有正常的df

A = pd.DataFrame([[1, 5, 2], [2, 4, 4], [3, 3, 1], [4, 2, 2], [5, 1, 4]],
                 columns=['A', 'B', 'C'], index=[1, 2, 3, 4, 5])

Following this recipe , I got the the results I wanted. 按照这个方法 ,我得到了我想要的结果。

In [62]: A.groupby((A['A'] > 2)).apply(lambda x: pd.Series(dict(
                   up_B=(x.B >= 0).sum(), down_B=(x.B < 0).sum(), mean_B=(x.B).mean(), std_B=(x.B).std(),
                   up_C=(x.C >= 0).sum(), down_C=(x.C < 0).sum(), mean_C=(x.C).mean(), std_C=(x.C).std())))

Out[62]:
       down_B  down_C  mean_B    mean_C     std_B     std_C  up_B  up_C
A                                                                      
False       0       0     4.5  3.000000  0.707107  1.414214     2     2
True        0       0     2.0  2.333333  1.000000  1.527525     3     3

This approach is fine, but imagine you had to do this for a large number of columns (15-100), then you have to type all that stuff in the formula, which can be cumbersome. 这种方法很好,但想象你必须为大量的列(15-100)做这个,然后你必须在公式中键入所有这些东西,这可能很麻烦。

Given that the same formulas are applied to ALL columns. 鉴于相同的公式适用于所有列。 Is there an efficient way to do this for a large number of columns?. 有没有一种有效的方法来为大量的列做到这一点?

Thanks 谢谢

Since you are aggregating each grouped column into one value, you can use agg instead of apply . 由于您将每个分组列聚合为一个值,因此可以使用agg而不是apply The agg method can take a list of functions as input. agg方法可以将函数列表作为输入。 The functions will be applied to each column : 这些函数将应用于每一列

def up(x):
    return (x >= 0).sum()
def down(x):
    return (x < 0).sum()

result = A.loc[:, 'B':'C'].groupby((A['A'] > 2)).agg(
             [up, down, 'mean', 'std'])
print(result)

yields 产量

       B                      C                         
      up down mean       std up down      mean       std
A                                                       
False  2    0  4.5  0.707107  2    0  3.000000  1.414214
True   3    0  2.0  1.000000  3    0  2.333333  1.527525

result has hierarchical ("MultiIndexed") columns. result具有分层(“MultiIndexed”)列。 To select a certain column (or columns), you could use: 要选择某个列(或列),您可以使用:

In [39]: result['B','mean']
Out[39]: 
A
False    4.5
True     2.0
Name: (B, mean), dtype: float64

In [46]: result[[('B', 'mean'), ('C', 'mean')]]
Out[46]: 
         B         C
      mean      mean
A                   
False  4.5  3.000000
True   2.0  2.333333

or you could move one level of the MultiIndex to the index: 或者您可以将MultiIndex的一个级别移动到索引:

In [40]: result.stack()
Out[40]: 
                   B         C
A                             
False up    2.000000  2.000000
      down  0.000000  0.000000
      mean  4.500000  3.000000
      std   0.707107  1.414214
True  up    3.000000  3.000000
      down  0.000000  0.000000
      mean  2.000000  2.333333
      std   1.000000  1.527525

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas,将多列的多个函数应用于groupby对象 - pandas, apply multiple functions of multiple columns to groupby object pandas groupby适用于多个列 - pandas groupby apply on multiple columns 将多个函数应用于多个 groupby 列 - Apply multiple functions to multiple groupby columns 将多个自定义函数应用于 Python 中 Pandas 中的多个 groupby 对象的多个列 - Apply multiple custom functions to multiple columns on multiple groupby objects in Pandas in Python pandas groupby并在多列上应用函数 - pandas groupby and apply function on multiple columns Pandas:如何在 groupby 对象上使用自定义应用函数返回多列 - Pandas: How to return multiple columns with a custom apply function on a groupby object pandas groupby 将相同的函数应用于多列 - pandas groupby apply the same function to multiple columns 如何使用Pandas groupby()的split-apply-combine模式来同时规范化多列 - How to use split-apply-combine pattern of pandas groupby() to normalize multiple columns simultaneously 将多个函数应用于大熊猫groupby应用是否返回多个数据帧? - Apply multiple functions to a pandas groupby apply that returns multiple dataframes? 如何在 pandas 中的分组 DataFrame 中的多个列上应用多个自定义函数? - How to apply multiple custom functions on multiple columns in grouped DataFrame in pandas?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM