python pandas：将不同的聚合函数应用于不同的列

Question

I am trying to understand what the equivalent of this simple SQL statement would be: 我试图理解这个简单的SQL语句的等价物是什么：

select mykey, sum(Field1) as sum_of_field1, avg(Field1) as avg_field1, min(field2) as min_field2
from df
group by mykey

I understand I can passa a dictionary to the agg() function: 我明白我可以将字典传递给agg（）函数：

  f = {'Field1':'sum',
         'Field2':['max','mean'],
         'Field3':['min','mean','count'],
         'Field4':'count'
         }

    grouped = df.groupby('mykey').agg(f)

However, the resulting column names seem to be chosen by pandas automatically: ('Field1','sum') etc. 但是，结果列名似乎是由pandas自动选择的：（ ('Field1','sum')等。

Is there a way to pass strings for column names, so that the field is not ('Field1','sum') but something I can choose, like sum_of_field1 ? 有没有办法为列名传递字符串，所以字段不是('Field1','sum')但我能选择的东西，如sum_of_field1？

Thanks. 谢谢。 I looked at the docs here: http://pandas.pydata.org/pandas-docs/stable/groupby.html but couldn't quite find an answer. 我查看了这里的文档： http ： //pandas.pydata.org/pandas-docs/stable/groupby.html但是找不到答案。

Answer 1

As of pandas 0.25, this is possible with a "Named aggregation" . 从pandas 0.25开始，这可以通过“命名聚合”来实现。

In [79]: animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
   ....:                         'height': [9.1, 6.0, 9.5, 34.0],
   ....:                         'weight': [7.9, 7.5, 9.9, 198.0]})
   ....: 

In [80]: animals
Out[80]: 
  kind  height  weight
0  cat     9.1     7.9
1  dog     6.0     7.5
2  cat     9.5     9.9
3  dog    34.0   198.0

In [82]: animals.groupby("kind").agg(
   ....:     min_height=('height', 'min'),
   ....:     max_height=('height', 'max'),
   ....:     average_weight=('weight', np.mean),
   ....: )
   ....: 
Out[82]: 
      min_height  max_height  average_weight
kind                                        
cat          9.1         9.5            8.90
dog          6.0        34.0          102.75

The previously deprecated version follows: 以前弃用的版本如下：

You can pass a dictionary of dictionaries to .agg mapping {column: {name: aggfunc}} , for example 例如，您可以将字典字典传递给.agg mapping {column: {name: aggfunc}}

In [46]: df.head()
Out[46]:
   Year  qtr  realgdp  realcons  realinvs  realgovt  realdpi  cpi_u      M1  \
0  1950    1   1610.5    1058.9     198.1     361.0   1186.1   70.6  110.20
1  1950    2   1658.8    1075.9     220.4     366.4   1178.1   71.4  111.75
2  1950    3   1723.0    1131.0     239.7     359.6   1196.5   73.2  112.95
3  1950    4   1753.9    1097.6     271.8     382.5   1210.0   74.9  113.93
4  1951    1   1773.5    1122.8     242.9     421.9   1207.9   77.3  115.08

   tbilrate  unemp      pop     infl  realint
0      1.12    6.4  149.461   0.0000   0.0000
1      1.17    5.6  150.260   4.5071  -3.3404
2      1.23    4.6  151.064   9.9590  -8.7290
3      1.35    4.2  151.871   9.1834  -7.8301
4      1.40    3.5  152.393  12.6160 -11.2160

In [47]: df.groupby('qtr').agg({"realgdp": {"mean_gdp": "mean", "std_gdp": "std"},
                                "unemp": {"mean_unemp": "mean"}})
Out[47]:
         realgdp                   unemp
        mean_gdp      std_gdp mean_unemp
qtr
1    4506.439216  2104.195963   5.694118
2    4546.043137  2121.824090   5.686275
3    4580.507843  2132.897955   5.662745
4    4617.592157  2158.132698   5.654902

The result has a MultiIndex in the columns. 结果在列中有一个MultiIndex。 If you don't want that outer level, you can use .columns.droplevel(0) . 如果您不想要该外层，可以使用.columns.droplevel(0) 。

Answer 2

I agree this is a bit frustrating butI do find chaining with a rename method served my purpose. 我同意这有点令人沮丧，但我发现用rename方法进行链接符合我的目的。 Also, when it gets really complex, I will just reset the column names. 此外，当它变得非常复杂，我只是将重置列名。 It is a MultiIndex so it is immutable, and you should feel comfortable dealing with levels. 它是一个MultiIndex，所以它是不可变的，你应该感觉很舒服处理关卡。

Based on the pandas documentation 基于pandas 文档

The resulting aggregations are named for the functions themselves. 生成的聚合以函数本身命名。 If you need to rename, then you can add in a chained operation for a Series like this 如果需要重命名，则可以为此系列添加链接操作

In [67]: (grouped['C'].agg([np.sum, np.mean, np.std])
   ....:              .rename(columns={'sum': 'foo',
   ....:                               'mean': 'bar',
   ....:                               'std': 'baz'})
   ....: )
   ....: 
Out[67]: 
          foo       bar       baz
A                                
bar  0.392940  0.130980  0.181231
foo -1.796421 -0.359284  0.912265

When there are multiples uses of one function and you want to name it differently, this question of dropping the level and joining the different levels by underscore will help. 当有一个功能的倍数用途和您希望以不同的名字，这个问题由下划线下降水平和加入不同程度的帮助。

If you do find the sql syntax cleaner, there is a library called pandasql that give you this flexibility. 如果你确实发现sql语法更清晰，那么有一个名为pandasql的库可以为你提供这种灵活性。

python pandas：将不同的聚合函数应用于不同的列

问题描述

2 个解决方案

解决方案1
12 已采纳 2015-09-03 12:18:09

解决方案2
1 2018-12-19 03:31:09

python pandas：将不同的聚合函数应用于不同的列

问题描述

2 个解决方案

解决方案1 12 已采纳 2015-09-03 12:18:09

解决方案2 1 2018-12-19 03:31:09

解决方案1
12 已采纳 2015-09-03 12:18:09

解决方案2
1 2018-12-19 03:31:09