简体   繁体   English

如何将不同的功能应用于每组大熊猫groupby?

[英]how to apply different functions to each group of pandas groupby?

If I have a dataframe as follows, 如果我有如下数据框,

import numpy as np
import pandas as pd
df2 = pd.DataFrame({'type':['A', 'A', 'B', 'B', 'C', 'C'], 'value':np.random.randn(6)})
>>> df2
  type     value
0    A -1.136014
1    A -0.715392
2    B -1.961665
3    B -0.525517
4    C  1.358249
5    C  0.652092

I want to group the dataframe by the column 'type' and apply different function to each group , say, min for group with type A, max for group with type B and mean for group with type C. 我想按``类型''列对数据框进行分组,并对每个组应用不同的功能 ,例如,对于类型A的来说是min ,对于类型B的组来说mean max ,对于类型C的组来说mean

EDIT 2014-08-05 12:00 GMT+8: 编辑2014-08-05 12:00 GMT + 8:

Some really nice answers have been provided from users. 用户提供了一些非常好的答案。 But my reason to use groupby is because I want the results in same dataframe which looks like as follows: 但是我之所以使用groupby是因为我希望结果在相同的数据框中,如下所示:

  type     value
0    A -1.136014
1    B -0.525517
2    C  1.005171

Any help is appreciated~ 任何帮助表示赞赏〜

Upvoted abarnert's answer, because it's a good one. 支持abarnert的答案,因为这是一个很好的答案。

On the other hand, in order answer OP's question according to OP's specification: 另一方面,为了根据OP的规范回答OP的问题:

for group in df2.groupby('type'):
    print group
    if group[0] == 'A':
        print group[1].min()
    if group[0] == 'B':
        print group[1].max()
    if group[0] == 'C':
        print group[1].mean()

On the other hand, I would recommend simply computing everything for every group, since it's easy enough anyways. 另一方面,我建议只为每个组计算所有内容,因为它很容易。 This is the intent behind doing a groupby operation. 这是进行groupby操作的目的。

In [5]: summary = pd.DataFrame()

In [6]: summary['mean'] = df2.groupby('type').mean()['value']

In [7]: summary['min'] = df2.groupby('type').min()['value']

In [8]: summary['max'] = df2.groupby('type').max()['value']

summary will look like this: summary将如下所示:

In [9]: summary
Out[9]: 
          mean       min       max
type                              
A     0.440490  0.231633  0.649346
B     0.172303  0.023094  0.321513
C     0.669650 -0.373361  1.712662

Why even use groupby here? 为什么还要在这里使用groupby It's just getting in the way, and you don't want to do anything with the groups as a whole. 它只是一个障碍,您不想对整个小组进行任何操作。 So why not just select each group manually? 那么,为什么不手动选择每个组呢?

>>> df2[df2.type=='A']['value'].min()
-1.4442888428898644
>>> df2[df2.type=='B']['value'].max()
1.0361392902054989
>>> df2[df2.type=='C']['value'].mean()
0.89822391958453074

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM