[英]Pandas: conditional slicing after groupby mean
This must have been asked before but I couldn't find a solution - sorry if duplicate! 这一定是以前问过的,但是我找不到解决方案-如果重复,对不起! I did a groupby on month and year a dataframe with datetime index (called 'time') and applied a mean
df = df.groupby([df.index.year, df.index.month]).mean()
, which gave the following: 我按月和年对具有日期时间索引(称为“时间”)的数据帧进行了
df = df.groupby([df.index.year, df.index.month]).mean()
并应用了平均值df = df.groupby([df.index.year, df.index.month]).mean()
,得出下列:
0
time time
2000 1 0.245888
2 0.579210
3 0.519101
4 1.724130
5 2.909998
6 6.754044
7 5.654214
8 0.972300
9 0.207180
10 -0.608038
11 -2.271975
12 -9.407542
2001 1 -4.206406
2 0.339256
3 2.447668
4 2.159161
5 2.014476
6 4.495522
7 2.130116
8 4.280266
9 2.329842
10 -1.560461
11 -2.232722
12 -2.182392
It has 2 index both called 'time', corresponding to year and month. 它有2个索引,分别称为“时间”,分别对应于年和月。 Now I want to slice by month (create a new dataframe with just month=1 or from month=6 to 8 etc.) but I'm not sure how to do operation on this.
现在,我想按月分片(用month = 1或从month = 6到8等创建一个新的数据框),但是我不确定如何对此进行操作。
I want to do something like: 我想做类似的事情:
df.loc[(df.index.month == 1)]
df.loc[(df.index.month == 1) | (df.index.month == 2)]
df.loc[(df.index.month >= 1) & (df.index.month <= 6)]
etc. 等等
Doing this gives AttributeError: 'MultiIndex' object has no attribute 'month'
(understandably). 这样做会
AttributeError: 'MultiIndex' object has no attribute 'month'
(可以理解)。 I tried renaming the index with df.rename(['year', 'month'])
which gives AttributeError list object is not callable
. 我尝试使用
df.rename(['year', 'month'])
重命名索引,这使AttributeError list object is not callable
。 I thought perhaps I need to reset the index so it is in a datetime format again but df.reset_index()
gives ValueError cannot insert time
. 我以为也许我需要重置索引,使其再次为日期时间格式,但
df.reset_index()
给出ValueError cannot insert time
。
df.index gives: df.index给出:
MultiIndex(levels=[[2000, 2001], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
codes=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]],
names=['time', 'time'])
Edit - 1. Edited to say I want a more flexible operation on slicing, not just getting a particular month. 编辑 -1.编辑以表示我希望对切片进行更灵活的操作,而不仅仅是获得特定的月份。 2. Original df looked like:
2.原始df如下所示:
0
time
2000-01-01 1.427332
2000-01-02 1.468405
2000-01-03 1.525916
2000-01-04 1.399915
2000-01-05 1.192117
2000-01-06 1.191234
2000-01-07 1.431109
2000-01-08 1.687709
2000-01-09 1.876527
2000-01-10 1.871062
2000-01-11 1.759002
2000-01-12 1.553009
2000-01-13 1.336487
2000-01-14 1.105376
2000-01-15 0.732866
2000-01-16 0.259119
2000-01-17 -0.003458
2000-01-18 -0.180170
2000-01-19 -0.275862
2000-01-20 -0.580456
2000-01-21 -0.800049
2000-01-22 -0.990277
2000-01-23 -1.139482
2000-01-24 -1.264528
2000-01-25 -1.378858
2000-01-26 -1.516954
2000-01-27 -1.394427
2000-01-28 -1.371782
2000-01-29 -1.337087
2000-01-30 -1.120146
... ...
2001-12-02 -4.521928
2001-12-03 -4.499393
2001-12-04 -4.425628
2001-12-05 -4.270720
2001-12-06 -4.286983
2001-12-07 -4.141410
2001-12-08 -3.886460
2001-12-09 -4.008633
2001-12-10 -3.772096
2001-12-11 -3.261724
2001-12-12 -3.271314
2001-12-13 -3.306891
2001-12-14 -3.111070
2001-12-15 -2.694092
2001-12-16 -2.063524
2001-12-17 -1.593670
2001-12-18 -1.279061
2001-12-19 -0.957185
2001-12-20 -0.616801
2001-12-21 -0.316757
2001-12-22 -0.292797
2001-12-23 -0.226818
2001-12-24 -0.196901
2001-12-25 -0.237203
2001-12-26 -0.221769
2001-12-27 -0.167911
2001-12-28 -0.050808
2001-12-29 -0.044765
2001-12-30 -0.384740
2001-12-31 -0.913277
730 rows × 1 columns
First is possible use rename
: 首先可以使用
rename
:
df = df.groupby([df.index.year.rename('year'),
df.index.month.rename('month')]).mean()
Or rename_axis
for set MultiIndex
names: 或
rename_axis
用于设置MultiIndex
名称:
df = df.groupby([df.index.year, df.index.month]).mean().rename_axis(('year','month'))
For select use DataFrame.xs
: 选择使用
DataFrame.xs
:
df1 = df.xs(1, axis=0, level=1)
If want filter like in your solution need get_level_values
for select second level: 如果要在解决方案中使用类似的过滤器,则需要
get_level_values
用于选择第二级:
df.loc[(df.index.get_level_values(1) == 1)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.