如何使用python pandas基于组平均列？

Question

我输入如下：

NAME            Geoid    Year   QTR Index 
'Abilene, TX    10180   1978    3   0
'Abilene, TX    10180   1978    4   0
'Abilene, TX    10180   1979    1   0
'Abilene, TX    10180   1979    2   0
'Decatur, IL    19500   1998    1   110.51
'Decatur, IL    19500   1998    2   110.48
'Decatur, IL    19500   1998    3   113.01
'Decatur, IL    19500   1998    4   114.16
'Fairbanks, AK  21820   1990    1   63.74
'Fairbanks, AK  21820   1990    2   70.68
'Fairbanks, AK  21820   1990    3   83.56
'Fairbanks, AK  21820   1990    4   83.95

我想从MYSQL转换为python的查询如下：

   SELECT  geoid, name, YEAR, AVG(index)
   FROM table_1
   WHERE geoid>0
   GROUP BY geoid, metro_name, YEAR;

AVG的pythonic等价物是我在线阅读的意思，但是当我使用它时，它给了我一个单一的价值。

大熊猫获得列平均值/平均值

但我希望输出分组的年份和季度如下：

Name            Geoid   YEAR    AVG(index)
'Abilene, TX    10180   1978    0
'Abilene, TX    10180   1979    0
'Decatur, IL    19500   1998    111.75
'Fairbanks, AK  21820   1990    74.9875

怎么做到这一点？

Answer 1

首先使用query或boolean indexing进行过滤，然后使用聚合mean进行groupby ：

df1 = df.query('Geoid > 0').groupby(['NAME','Geoid','Year'], as_index=False)['Index'].mean()
print (df1)
             NAME  Geoid  Year     Index
0    'Abilene, TX  10180  1978    0.0000
1    'Abilene, TX  10180  1979    0.0000
2    'Decatur, IL  19500  1998  112.0400
3  'Fairbanks, AK  21820  1990   75.4825

df1 = df[df['Geoid'] > 0].groupby(['NAME','Geoid','Year'], as_index=False)['Index'].mean()
print (df1)
             NAME  Geoid  Year     Index
0    'Abilene, TX  10180  1978    0.0000
1    'Abilene, TX  10180  1979    0.0000
2    'Decatur, IL  19500  1998  112.0400
3  'Fairbanks, AK  21820  1990   75.4825

如何使用python pandas基于组平均列？

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-10-09 13:57:01

如何使用python pandas基于组平均列？

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-10-09 13:57:01

解决方案1
3 已采纳 2017-10-09 13:57:01