[英]How to use pandas Grouper to get sum of values within each hour
I have the following table:我有下表:
Hora_Retiro count_uses
0 00:00:18 1
1 00:00:34 1
2 00:02:27 1
3 00:03:13 1
4 00:06:45 1
... ... ...
748700 23:58:47 1
748701 23:58:49 1
748702 23:59:11 1
748703 23:59:47 1
748704 23:59:56 1
And I want to group all values within each hour, so I can see the total number of uses per hour (00:00:00 - 23:00:00)我想对每小时内的所有值进行分组,这样我就可以看到每小时的总使用次数 (00:00:00 - 23:00:00)
I have the following code:我有以下代码:
hora_pico_aug= hora_pico.groupby(pd.Grouper(key="Hora_Retiro",freq='H')).count()
Hora_Retiro column is of timedelta64[ns] type Which gives the following output: Hora_Retiro 列是 timedelta64[ns] 类型,它给出以下输出:
count_uses
Hora_Retiro
00:00:02 2566
01:00:02 602
02:00:02 295
03:00:02 5
04:00:02 10
05:00:02 4002
06:00:02 16075
07:00:02 39410
08:00:02 76272
09:00:02 56721
10:00:02 36036
11:00:02 32011
12:00:02 33725
13:00:02 41032
14:00:02 50747
15:00:02 50338
16:00:02 42347
17:00:02 54674
18:00:02 76056
19:00:02 57958
20:00:02 34286
21:00:02 22509
22:00:02 13894
23:00:02 7134
However, the index column starts at 00:00:02, and I want it to start at 00:00:00, and then go from one hour intervals.但是,索引列从 00:00:02 开始,我希望它从 00:00:00 开始,然后从一小时间隔开始。 Something like this:像这样的东西:
count_uses
Hora_Retiro
00:00:00 2565
01:00:00 603
02:00:00 295
03:00:00 5
04:00:00 10
05:00:00 4002
06:00:00 16075
07:00:00 39410
08:00:00 76272
09:00:00 56721
10:00:00 36036
11:00:00 32011
12:00:00 33725
13:00:00 41032
14:00:00 50747
15:00:00 50338
16:00:00 42347
17:00:00 54674
18:00:00 76056
19:00:00 57958
20:00:00 34286
21:00:00 22509
22:00:00 13894
23:00:00 7134
How can i make it to start at 00:00:00??我怎样才能让它在 00:00:00 开始?
Thanks for the help!谢谢您的帮助!
You can create an hour
column from Hora_Retiro
column.您可以从Hora_Retiro
列创建hour
列。
df['hour'] = df['Hora_Retiro'].dt.hour
And then groupby
on the basis of hour
然后根据hour
groupby
gpby_df = df.groupby('hour')['count_uses'].sum().reset_index()
gpby_df['hour'] = pd.to_datetime(gpby_df['hour'], format='%H').dt.time
gpby_df.columns = ['Hora_Retiro', 'sum_count_uses']
gpby_df
gives给
Hora_Retiro sum_count_uses
0 00:00:00 14
1 09:00:00 1
2 10:00:00 2
3 20:00:00 2
I assume that Hora_Retiro column in your DataFrame is of Timedelta type.我认为在你的数据帧Hora_Retiro列Timedelta类型。 It is not datetime , as in this case there would be printed also the date part.它不是datetime ,因为在这种情况下也会打印日期部分。
Indeed, your code creates groups starting at the minute / second taken from the first row.实际上,您的代码从第一行的分钟/秒开始创建组。
To group by "full hours":按“全小时”分组:
The code to do it is:执行此操作的代码是:
hora_pico.groupby(hora_pico.Hora_Retiro.apply(
lambda tt: tt.round('H'))).count_uses.count()
However I advise you to make up your mind, what do you want to count: rows or values in count_uses column.但是,我建议您下定决心,您要计算什么: count_uses列中的行或值。 In the second case replace count function with sum .在第二种情况下,用sum替换count函数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.