简体   繁体   English

如何按一天中的小时分组时间序列?

[英]How do I group a time series by hour of day?

I have a time series and I want to group the rows by hour of day (regardless of date) and visualize these as boxplots. 我有一个时间序列,我想按一天中的小时(无论日期如何)对行进行分组,并将其可视化为箱线图。 So I'd want 24 boxplots starting from hour 1, then hour 2, then hour 3 and so on. 因此,我希望从第1个小时开始,然后从第2个小时到第3个小时,依次类推,进行24次框线图绘制。

The way I see this working is splitting the dataset up into 24 series (1 for each hour of the day), creating a boxplot for each series and then plotting this on the same axes. 我看到此工作的方式是将数据集分成24个系列(一天中的每个小时1个),为每个系列创建一个箱形图,然后在相同的轴上进行绘制。

The only way I can think of to do this is to manually select all the values between each hour, is there a faster way? 我能想到的唯一方法是手动选择每个小时之间的所有值,有没有更快的方法?

some sample data: 一些样本数据:

Date    Actual Consumption
2018-01-01 00:00:00 47.05
2018-01-01 00:15:00 46
2018-01-01 00:30:00 44
2018-01-01 00:45:00 45
2018-01-01 01:00:00 43.5
2018-01-01 01:15:00 43.5
2018-01-01 01:30:00 43
2018-01-01 01:45:00 42.5
2018-01-01 02:00:00 43
2018-01-01 02:15:00 42.5
2018-01-01 02:30:00 41
2018-01-01 02:45:00 42.5
2018-01-01 03:00:00 42.04
2018-01-01 03:15:00 41.96
2018-01-01 03:30:00 44
2018-01-01 03:45:00 44
2018-01-01 04:00:00 43.54
2018-01-01 04:15:00 43.46
2018-01-01 04:30:00 43.5
2018-01-01 04:45:00 43
2018-01-01 05:00:00 42.04

This is what i've tried so far: 到目前为止,这是我尝试过的:

zero = df.between_time('00:00', '00:59')
one = df.between_time('01:00', '01:59')
two = df.between_time('02:00', '02:59')

and then I would plot a boxplot for each of these on the same axes. 然后我将在相同的轴上为每个图绘制一个箱线图。 However it's very tedious to do this for all 24 hours in a day. 但是,一天24小时都执行此操作非常繁琐。

This is the kind of output I want: https://www.researchgate.net/figure/Boxplot-of-the-NOx-data-by-hour-of-the-day_fig1_24054015 这是我想要的输出类型: https : //www.researchgate.net/figure/Boxplot-of-the-NOx-data-by-hour-of-the-day_fig1_24054015

there are 2 steps to achieve this: 有两个步骤可以实现此目的:

  1. convert Actual to date time: 将实际时间转换为日期时间:

     df.Actual = pd.to_datetime(df.Actual) 
  2. Group by the hour: 按小时分组:

     df.groupby([df.Date, df.Actual.dt.hour+1]).Consumption.sum().reset_index() 

I assumed you wanted to sum the Consumption (unless you wish to have mean or whatever just change it). 我以为您想对消费进行汇总(除非您希望拥有平均数或仅需更改即可)。 One note: hour+1 so it will start from 1 and not 0 (remove it if you wish 0 to be midnight). 注意:小时+1,因此它将从1开始而不是0(如果您希望0是午夜,则将其删除)。

desired result: 预期结果:

    Date    Actual  Consumption
0   2018-01-01  1   182.05
1   2018-01-01  2   172.50
2   2018-01-01  3   169.00
3   2018-01-01  4   172.00
4   2018-01-01  5   173.50
5   2018-01-01  6   42.04

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM