简体   繁体   English

熊猫:groupby均值后的条件切片

[英]Pandas: conditional slicing after groupby mean

This must have been asked before but I couldn't find a solution - sorry if duplicate! 这一定是以前问过的,但是我找不到解决方案-如果重复,对不起! I did a groupby on month and year a dataframe with datetime index (called 'time') and applied a mean df = df.groupby([df.index.year, df.index.month]).mean() , which gave the following: 我按月和年对具有日期时间索引(称为“时间”)的数据帧进行了df = df.groupby([df.index.year, df.index.month]).mean()并应用了平均值df = df.groupby([df.index.year, df.index.month]).mean() ,得出下列:

               0
time    time    

2000    1   0.245888
    2   0.579210
    3   0.519101
    4   1.724130
    5   2.909998
    6   6.754044
    7   5.654214
    8   0.972300
    9   0.207180
    10  -0.608038
    11  -2.271975
    12  -9.407542
2001    1   -4.206406
    2   0.339256
    3   2.447668
    4   2.159161
    5   2.014476
    6   4.495522
    7   2.130116
    8   4.280266
    9   2.329842
    10  -1.560461
    11  -2.232722
    12  -2.182392

It has 2 index both called 'time', corresponding to year and month. 它有2个索引,分别称为“时间”,分别对应于年和月。 Now I want to slice by month (create a new dataframe with just month=1 or from month=6 to 8 etc.) but I'm not sure how to do operation on this. 现在,我想按月分片(用month = 1或从month = 6到8等创建一个新的数据框),但是我不确定如何对此进行操作。

I want to do something like: 我想做类似的事情:

df.loc[(df.index.month == 1)]
df.loc[(df.index.month == 1) | (df.index.month == 2)]
df.loc[(df.index.month >= 1) & (df.index.month <= 6)]

etc. 等等

Doing this gives AttributeError: 'MultiIndex' object has no attribute 'month' (understandably). 这样做会AttributeError: 'MultiIndex' object has no attribute 'month' (可以理解)。 I tried renaming the index with df.rename(['year', 'month']) which gives AttributeError list object is not callable . 我尝试使用df.rename(['year', 'month'])重命名索引,这使AttributeError list object is not callable I thought perhaps I need to reset the index so it is in a datetime format again but df.reset_index() gives ValueError cannot insert time . 我以为也许我需要重置索引,使其再次为日期时间格式,但df.reset_index()给出ValueError cannot insert time

df.index gives: df.index给出:

MultiIndex(levels=[[2000, 2001], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
           codes=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]],
           names=['time', 'time'])

Edit - 1. Edited to say I want a more flexible operation on slicing, not just getting a particular month. 编辑 -1.编辑以表示我希望对切片进行更灵活的操作,而不仅仅是获得特定的月份。 2. Original df looked like: 2.原始df如下所示:

             0
time    
2000-01-01  1.427332
2000-01-02  1.468405
2000-01-03  1.525916
2000-01-04  1.399915
2000-01-05  1.192117
2000-01-06  1.191234
2000-01-07  1.431109
2000-01-08  1.687709
2000-01-09  1.876527
2000-01-10  1.871062
2000-01-11  1.759002
2000-01-12  1.553009
2000-01-13  1.336487
2000-01-14  1.105376
2000-01-15  0.732866
2000-01-16  0.259119
2000-01-17  -0.003458
2000-01-18  -0.180170
2000-01-19  -0.275862
2000-01-20  -0.580456
2000-01-21  -0.800049
2000-01-22  -0.990277
2000-01-23  -1.139482
2000-01-24  -1.264528
2000-01-25  -1.378858
2000-01-26  -1.516954
2000-01-27  -1.394427
2000-01-28  -1.371782
2000-01-29  -1.337087
2000-01-30  -1.120146
... ...
2001-12-02  -4.521928
2001-12-03  -4.499393
2001-12-04  -4.425628
2001-12-05  -4.270720
2001-12-06  -4.286983
2001-12-07  -4.141410
2001-12-08  -3.886460
2001-12-09  -4.008633
2001-12-10  -3.772096
2001-12-11  -3.261724
2001-12-12  -3.271314
2001-12-13  -3.306891
2001-12-14  -3.111070
2001-12-15  -2.694092
2001-12-16  -2.063524
2001-12-17  -1.593670
2001-12-18  -1.279061
2001-12-19  -0.957185
2001-12-20  -0.616801
2001-12-21  -0.316757
2001-12-22  -0.292797
2001-12-23  -0.226818
2001-12-24  -0.196901
2001-12-25  -0.237203
2001-12-26  -0.221769
2001-12-27  -0.167911
2001-12-28  -0.050808
2001-12-29  -0.044765
2001-12-30  -0.384740
2001-12-31  -0.913277
730 rows × 1 columns

First is possible use rename : 首先可以使用rename

df = df.groupby([df.index.year.rename('year'), 
                 df.index.month.rename('month')]).mean()

Or rename_axis for set MultiIndex names: rename_axis用于设置MultiIndex名称:

df = df.groupby([df.index.year, df.index.month]).mean().rename_axis(('year','month'))

For select use DataFrame.xs : 选择使用DataFrame.xs

df1 = df.xs(1, axis=0, level=1)

If want filter like in your solution need get_level_values for select second level: 如果要在解决方案中使用类似的过滤器,则需要get_level_values用于选择第二级:

df.loc[(df.index.get_level_values(1) == 1)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM