简体   繁体   English

pandas 中的 groupby() 和 agg()

[英]groupby() and agg() in pandas

Here is the dataframe named ' census ':这是名为“人口普查”的dataframe

     SUMLEV  REGION  COUNTY     STNAME        CTYNAME           CENSUS2010POP   ESTIMATESBASE2010
0      50     3       1        Alabama      Autauga County        54571              54571
1      50     3       3        Alabama      Baldwin County        182265            182265
2      50     3       5        Alabama      Barbour County        27457              27457
3      50     4       3        Arizona      Cochise County        131346            131357
4      50     4       5        Arizona      Coconino County       134421            134437
5      50     4       7        Arizona      Gila County           53597              53597
6      50     4      21     California      Glenn County          28122              28122
7      50     4      23     California      Humboldt County       134623            134623
8      50     4      25     California      Imperial County       174528            17452

I want to calculate the sum and average of 'CENSUS2010POP' for each state( 'STNAME' ) and display it as a dataframe.我想计算每个州( “STNAME” )的“CENSUS2010POP”的总和和平均值,并将其显示为 dataframe。

Here's my code,这是我的代码,

census.set_index('STNAME')
census.groupby(level=0).CENSUS2010POP.agg({'avg': np.mean, 'sum': np.sum}).head()

However it gives the error: nested renamer is not supported但是它给出了错误:不支持嵌套重命名器

I also tried我也试过

census.groupby('STNAME').CENSUS2010POP.agg({'avg':np.mean, 'sum':np.sum})

It gives the same error as above.它给出了与上面相同的错误。

Because processing only one column is possible pass tuple s:因为只处理一列是可能的,所以传递tuple s:

df = census.groupby('STNAME').CENSUS2010POP.agg([('avg', np.mean), ('sum', np.sum)]).head()
print (df)
                      avg     sum
STNAME                           
Alabama      88097.666667  264293
Arizona     106454.666667  319364
California  112424.333333  337273

Or named aggregations:或命名聚合:

census.groupby('STNAME').agg(avg = ('CENSUS2010POP', np.mean), 
                            sum=  ('CENSUS2010POP', np.sum)).head()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM