简体   繁体   English

python pandas:将不同的聚合函数应用于不同的列

[英]python pandas: applying different aggregate functions to different columns

I am trying to understand what the equivalent of this simple SQL statement would be: 我试图理解这个简单的SQL语句的等价物是什么:

select mykey, sum(Field1) as sum_of_field1, avg(Field1) as avg_field1, min(field2) as min_field2
from df
group by mykey

I understand I can passa a dictionary to the agg() function: 我明白我可以将字典传递给agg()函数:

  f = {'Field1':'sum',
         'Field2':['max','mean'],
         'Field3':['min','mean','count'],
         'Field4':'count'
         }

    grouped = df.groupby('mykey').agg(f)

However, the resulting column names seem to be chosen by pandas automatically: ('Field1','sum') etc. 但是,结果列名似乎是由pandas自动选择的:( ('Field1','sum')等。

Is there a way to pass strings for column names, so that the field is not ('Field1','sum') but something I can choose, like sum_of_field1 ? 有没有办法为列名传递字符串,所以字段不是('Field1','sum')但我能选择的东西,如sum_of_field1?

Thanks. 谢谢。 I looked at the docs here: http://pandas.pydata.org/pandas-docs/stable/groupby.html but couldn't quite find an answer. 我查看了这里的文档: http//pandas.pydata.org/pandas-docs/stable/groupby.html但是找不到答案。

As of pandas 0.25, this is possible with a "Named aggregation" . 从pandas 0.25开始,这可以通过“命名聚合”来实现

In [79]: animals = pd.DataFrame({'kind': ['cat', 'dog', 'cat', 'dog'],
   ....:                         'height': [9.1, 6.0, 9.5, 34.0],
   ....:                         'weight': [7.9, 7.5, 9.9, 198.0]})
   ....: 

In [80]: animals
Out[80]: 
  kind  height  weight
0  cat     9.1     7.9
1  dog     6.0     7.5
2  cat     9.5     9.9
3  dog    34.0   198.0

In [82]: animals.groupby("kind").agg(
   ....:     min_height=('height', 'min'),
   ....:     max_height=('height', 'max'),
   ....:     average_weight=('weight', np.mean),
   ....: )
   ....: 
Out[82]: 
      min_height  max_height  average_weight
kind                                        
cat          9.1         9.5            8.90
dog          6.0        34.0          102.75

The previously deprecated version follows: 以前弃用的版本如下:


You can pass a dictionary of dictionaries to .agg mapping {column: {name: aggfunc}} , for example 例如,您可以将字典字典传递给.agg mapping {column: {name: aggfunc}}

In [46]: df.head()
Out[46]:
   Year  qtr  realgdp  realcons  realinvs  realgovt  realdpi  cpi_u      M1  \
0  1950    1   1610.5    1058.9     198.1     361.0   1186.1   70.6  110.20
1  1950    2   1658.8    1075.9     220.4     366.4   1178.1   71.4  111.75
2  1950    3   1723.0    1131.0     239.7     359.6   1196.5   73.2  112.95
3  1950    4   1753.9    1097.6     271.8     382.5   1210.0   74.9  113.93
4  1951    1   1773.5    1122.8     242.9     421.9   1207.9   77.3  115.08

   tbilrate  unemp      pop     infl  realint
0      1.12    6.4  149.461   0.0000   0.0000
1      1.17    5.6  150.260   4.5071  -3.3404
2      1.23    4.6  151.064   9.9590  -8.7290
3      1.35    4.2  151.871   9.1834  -7.8301
4      1.40    3.5  152.393  12.6160 -11.2160

In [47]: df.groupby('qtr').agg({"realgdp": {"mean_gdp": "mean", "std_gdp": "std"},
                                "unemp": {"mean_unemp": "mean"}})
Out[47]:
         realgdp                   unemp
        mean_gdp      std_gdp mean_unemp
qtr
1    4506.439216  2104.195963   5.694118
2    4546.043137  2121.824090   5.686275
3    4580.507843  2132.897955   5.662745
4    4617.592157  2158.132698   5.654902

The result has a MultiIndex in the columns. 结果在列中有一个MultiIndex。 If you don't want that outer level, you can use .columns.droplevel(0) . 如果您不想要该外层,可以使用.columns.droplevel(0)

I agree this is a bit frustrating butI do find chaining with a rename method served my purpose. 我同意这有点令人沮丧,但我发现用rename方法进行链接符合我的目的。 Also, when it gets really complex, I will just reset the column names. 此外,当它变得非常复杂,我只是将重置列名。 It is a MultiIndex so it is immutable, and you should feel comfortable dealing with levels. 它是一个MultiIndex,所以它是不可变的,你应该感觉很舒服处理关卡。

Based on the pandas documentation 基于pandas 文档

The resulting aggregations are named for the functions themselves. 生成的聚合以函数本身命名。 If you need to rename, then you can add in a chained operation for a Series like this 如果需要重命名,则可以为此系列添加链接操作

In [67]: (grouped['C'].agg([np.sum, np.mean, np.std])
   ....:              .rename(columns={'sum': 'foo',
   ....:                               'mean': 'bar',
   ....:                               'std': 'baz'})
   ....: )
   ....: 
Out[67]: 
          foo       bar       baz
A                                
bar  0.392940  0.130980  0.181231
foo -1.796421 -0.359284  0.912265

When there are multiples uses of one function and you want to name it differently, this question of dropping the level and joining the different levels by underscore will help. 当有一个功能的倍数用途和您希望以不同的名字,这个问题由下划线下降水平和加入不同程度的帮助。

If you do find the sql syntax cleaner, there is a library called pandasql that give you this flexibility. 如果你确实发现sql语法更清晰,那么有一个名为pandasql的库可以为你提供这种灵活性。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何对熊猫中的单独列使用不同的聚合函数? -蟒蛇 - how to use different aggregate functions for separate columns in pandas? - python 将不同的聚合函数应用于不同的列(现在不推荐使用重命名的dict) - Applying different aggregate functions to different columns (now that dict with renaming is deprecated) 遍历 pandas 中的列,同时对每列应用不同的函数 - Iterating through columns in pandas while applying different functions to each column 在多个列列表上应用不同的 Pandas GroupBy 函数 - Applying Different Pandas GroupBy Functions on multiple list of columns 将不同聚合函数应用于 pandas dataframe 的不同列的 Pythonic 方式? 并有效地命名列? - Pythonic way to apply different aggregate functions to different columns of a pandas dataframe? And to name the columns efficiently? 熊猫:将不同的功能应用于不同的列 - Pandas: apply different functions to different columns 在熊猫中对不同列使用不同功能的groupby - groupby in pandas with different functions for different columns 使用数据透视表时应用不同的聚合函数 - Applying different aggregate functions when using pivot_table Python Pandas:有效地汇总不同列上的不同函数并将结果列组合在一起 - Python Pandas: efficiently aggregating different functions on different columns and combining the resulting columns together 使用 resample 为 Pandas 数据框中的不同列聚合具有不同规则的数据 - using resample to aggregate data with different rules for different columns in a pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM