简体   繁体   中英

Min, Max, Avg for 2-hour intervals

In below query, I am calculating the minimum, the maximum, and the average for a two hour interval using PostgreSQL.

The query works fine for even start hours (..04:00:00+05:30) , but it gives similar result as that of even start time for odd start hours (..05:00:00+05:30) . The multiple by 2 returns even hours which is the problem.

SELECT tagid, CAST(sample_time_stamp as Date) AS stat_date, 
       (floor(extract(hour from sample_time_stamp)/2) * 2)::int AS hrs,
       min(sensor_reading) AS theMin,   
       max(sensor_reading) AS theMax,
       avg(sensor_reading) AS theAvg
FROM sensor_readings WHERE tagid =1 AND 
sample_time_stamp BETWEEN '2012-10-23 01:00:00+05:30'
                  AND     '2012-10-23 05:59:00+05:30'
GROUP BY tagid,CAST(sample_time_stamp as Date),
         floor(extract(hour from sample_time_stamp)/2) * 2
ORDER BY tagid,stat_date, hrs

OutPut for Odd start Hour ('2012-10-23 01:00:00+05:30')

tagid    date          hrs  theMin  themax    theAvg 
1        2012-10-23    0    6       58        30.95
1        2012-10-23    2    2       59        29.6916666666667
1        2012-10-23    4    3       89        31.7666666666667

OutPut for Even start Hour ('2012-10-23 02:00:00+05:30')

tagid    date          hrs  theMin   themax    theAvg
1        2012-10-23    2    2        59        29.6916666666667
1        2012-10-23    4    3        89        31.7666666666667

To get constant time-frames starting with your minimum timestamp:

WITH params AS (
   SELECT '2012-10-23 01:00:00+05:30'::timestamptz AS _min  -- input params
         ,'2012-10-23 05:59:00+05:30'::timestamptz AS _max
         ,'2 hours'::interval                      AS _interval
   )
  ,ts AS (SELECT generate_series(_min, _max, _interval) AS t_min FROM params)
  ,timeframe AS (
   SELECT t_min
         ,lead(t_min, 1, _max) OVER (ORDER BY t_min) AS t_max
   FROM ts, params
   )
SELECT s.tagid
      ,t.t_min
      ,t.t_max     -- mildly redundant except for last row
      ,min(s.sensor_reading) AS the_min
      ,max(s.sensor_reading) AS the_max
      ,avg(s.sensor_reading) AS the_avg
FROM   timeframe t
LEFT   JOIN sensor_readings s ON  s.tagid = 1
                              AND s.sample_time_stamp >= t.t_min
                              AND s.sample_time_stamp <  t.t_max
GROUP  BY 1,2,3
ORDER  BY 1,2;

Can be used for any time frame and any interval length. Requires PostgreSQL 8.4 or later.

If the maximum timestamp _max does not fall on _min + n * _interval the last time-frame is truncated. The last row can therefore represent a shorter time-frame than your desired _interval .

Key elements

  • Common Table Expressions (CTE) for easier handling. Input parameter values once in the top CTE params .

  • generate_series() for intervals to create the time raster.

  • Window function lead(...) with 3 parameters (including default) - to cover the special case of last row.

  • LEFT JOIN between raster and actual data, so that time frames without matching data will still show in the result (with NULL values as data). That's also the reason for a later edit: WHERE condition had to move to the LEFT JOIN condition, to achieve that.

Alternative time frame generation with recursive CTE:

WITH RECURSIVE params AS (
   SELECT '2012-10-23 01:00:00+05:30'::timestamptz AS _min  -- input params
         ,'2012-10-23 05:59:00+05:30'::timestamptz AS _max
         ,'2 hours'::interval                      AS _interval
   )
   , timeframe AS (
   SELECT _min AS t_min, LEAST(_min + _interval, _max) AS t_max
   FROM   params

   UNION  ALL
   SELECT t_max, LEAST(t_max + _interval, _max)
   FROM   timeframe t, params p
   WHERE  t_max < _max
   )
SELECT ...

Slightly faster ... take your pick.
-> sqlfiddle displaying both.

Note that you can have non-recursive CTEs (additionally) even when declared WITH RECURSIVE .

Performance & Index

Should be faster than your original query. Half the code deals with generating the time raster, which concerns few rows and is very fast. Handling actual table rows (the expensive part) gets cheaper, because we don't calculate a new value from every sample_time_stamp any more.

You should definitely have a multi-column index of the form:

CREATE INDEX foo_idx ON sensor_readings (tagid, sample_time_stamp DESC);

I use DESC on the assumption that you more often query recent entries (later timestamps). Remove the modifier if that's not the case. Doesn't make a big difference either way.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM