简体   繁体   English

如何使用pandas Grouper获取每小时内的值总和

[英]How to use pandas Grouper to get sum of values within each hour

I have the following table:我有下表:

         Hora_Retiro  count_uses
0         00:00:18           1
1         00:00:34           1
2         00:02:27           1
3         00:03:13           1
4         00:06:45           1
...            ...         ...
748700    23:58:47           1
748701    23:58:49           1
748702    23:59:11           1
748703    23:59:47           1
748704    23:59:56           1

And I want to group all values within each hour, so I can see the total number of uses per hour (00:00:00 - 23:00:00)我想对每小时内的所有值进行分组,这样我就可以看到每小时的总使用次数 (00:00:00 - 23:00:00)

I have the following code:我有以下代码:

hora_pico_aug= hora_pico.groupby(pd.Grouper(key="Hora_Retiro",freq='H')).count()

Hora_Retiro column is of timedelta64[ns] type Which gives the following output: Hora_Retiro 列是 timedelta64[ns] 类型,它给出以下输出:

                count_uses
Hora_Retiro            
00:00:02           2566
01:00:02            602
02:00:02            295
03:00:02              5
04:00:02             10
05:00:02           4002
06:00:02          16075
07:00:02          39410
08:00:02          76272
09:00:02          56721
10:00:02          36036
11:00:02          32011
12:00:02          33725
13:00:02          41032
14:00:02          50747
15:00:02          50338
16:00:02          42347
17:00:02          54674
18:00:02          76056
19:00:02          57958
20:00:02          34286
21:00:02          22509
22:00:02          13894
23:00:02           7134

However, the index column starts at 00:00:02, and I want it to start at 00:00:00, and then go from one hour intervals.但是,索引列从 00:00:02 开始,我希望它从 00:00:00 开始,然后从一小时间隔开始。 Something like this:像这样的东西:

                count_uses
Hora_Retiro            
00:00:00           2565
01:00:00            603
02:00:00            295
03:00:00              5
04:00:00             10
05:00:00           4002
06:00:00          16075
07:00:00          39410
08:00:00          76272
09:00:00          56721
10:00:00          36036
11:00:00          32011
12:00:00          33725
13:00:00          41032
14:00:00          50747
15:00:00          50338
16:00:00          42347
17:00:00          54674
18:00:00          76056
19:00:00          57958
20:00:00          34286
21:00:00          22509
22:00:00          13894
23:00:00           7134

How can i make it to start at 00:00:00??我怎样才能让它在 00:00:00 开始?

Thanks for the help!谢谢您的帮助!

You can create an hour column from Hora_Retiro column.您可以从Hora_Retiro列创建hour列。

df['hour'] = df['Hora_Retiro'].dt.hour

And then groupby on the basis of hour然后根据hour groupby

gpby_df = df.groupby('hour')['count_uses'].sum().reset_index()
gpby_df['hour'] = pd.to_datetime(gpby_df['hour'], format='%H').dt.time
gpby_df.columns = ['Hora_Retiro', 'sum_count_uses']
gpby_df

gives

Hora_Retiro sum_count_uses
0   00:00:00    14
1   09:00:00    1
2   10:00:00    2
3   20:00:00    2

I assume that Hora_Retiro column in your DataFrame is of Timedelta type.我认为在你的数据帧Hora_RetiroTimedelta类型。 It is not datetime , as in this case there would be printed also the date part.它不是datetime ,因为在这种情况下也会打印日期部分。

Indeed, your code creates groups starting at the minute / second taken from the first row.实际上,您的代码从第一行的分钟/秒开始创建组。

To group by "full hours":按“全小时”分组:

  • round each element in this column to hour ,将此列中的每个元素四舍五入到hour
  • then group (just by this rounded value).然后分组(仅按此舍入值)。

The code to do it is:执行此操作的代码是:

hora_pico.groupby(hora_pico.Hora_Retiro.apply(
    lambda tt: tt.round('H'))).count_uses.count()

However I advise you to make up your mind, what do you want to count: rows or values in count_uses column.但是,我建议您下定决心,您要计算什么: count_uses列中的行或值。 In the second case replace count function with sum .在第二种情况下,用sum替换count函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM