简体   繁体   中英

How do I effectively select the average sum of several sums being calculated based on different timestamps in SQL?

I have a database table looking like the following:

id | macaddr | load | timestamp
=========================================
 1 | 0011111 |   17 | 2012-02-07 10:00:00
 1 | 0011111 |    6 | 2012-02-07 12:00:00
 2 | 0022222 |    3 | 2012-02-07 12:00:03
 3 | 0033333 |    9 | 2012-02-07 12:00:04
 4 | 0022222 |    4 | 2012-02-07 12:00:06
 5 | 0033333 |    8 | 2012-02-07 12:00:10
...

Now, I would like to calculate the average load over all devices (= mac addresses) for different time periods, eg, for today, yesterday, this week, this month.

An average load can be calculated by first finding out the overall load sum at different points in time (sample dates) and then calculating the average of the load sums for these sample dates. For example, if I wanted the average load of the last ten seconds (and now is 2012-02-07 12:00:10), I could decide about my sample dates to be at 12:00:02, 12:00:04, 12:00:06, 12:00:08, and 12:00:10. Then, I would calculate the load sums by summing up the most recent load values for each device:

2012-02-07 12:00:02 |  6  [= load(id=2)]
2012-02-07 12:00:04 | 18  [= load(id=2) + load(id=3) + load(id=4)]
2012-02-07 12:00:06 | 19  [= load(id=2) + load(id=4) + load(id=5)]
2012-02-07 12:00:08 | 19  [= load(id=2) + load(id=4) + load(id=5)]
2012-02-07 12:00:10 | 18  [= load(id=2) + load(id=5) + load(id=6)]

A device's load value is ignored if it is older than, eg, an hour (happened here to id=1). The average would be 16 in this case.

Currently, I generate a rather complex query with many "UNION ALL" statements which is reeeeally slow:

SELECT avg(l.load_sum) as avg_load
FROM (
    SELECT sum(so.load) AS load_sum 
    FROM (
        SELECT * 
        FROM (
            SELECT si.macaddr, si.load 
            FROM sensor_data si WHERE si.timestamp > '2012-02-07 11:00:10' AND si.timestamp < '2012-02-07 12:00:10'
            ORDER BY si.timestamp DESC 
        ) AS sm
        GROUP BY macaddr
    ) so
    UNION ALL
    [THE SAME THING AGAIN WITH OTHER TIMESTAMPS]
    UNION ALL
    [AND AGAIN]
    UNION ALL
    [AND AGAIN]
    ...
) l

Now imagine I would like to calculate the average load for a whole month. With hourly sample dates I need to join 30x24=720 queries using the UNION ALL statement. The overall query takes nearly a minute to complete on my machine. I am sure there is a much better solution without the UNION ALL statement. However, I did not find anything useful on the Web. I would therefore be very thankful for your help!

Use a fraction of the unix timestamp: First we formulate the hourly (3600 seconds) averages:

SELECT
  macaddr, 
  sum(CAST(load AS float))/CAST(count(*) AS float) AS loadavg,
  FLOOR(UNIX_TIMESTAMP(`timestamp`)/3600) AS hourbase
FROM sensor_data
GROUP BY macaddr,FLOOR(UNIX_TIMESTAMP(`timestamp`)/3600)

Then we average those over the month

SELECT 
  avg(loadavg) as monthlyavg,
  macaddr
FROM (
    SELECT
      macaddr, 
      sum(CAST(load AS float))/CAST(count(*) AS float) AS loadavg,
      FLOOR(UNIX_TIMESTAMP(`timestamp`)/3600) AS hourbase
    FROM sensor_data
    WHERE `timestamp` BETWEEN '2012-01-07 12:00:00' AND '2012-02-07 11:59:59'
    GROUP BY macaddr,FLOOR(UNIX_TIMESTAMP(`timestamp`)/3600)
) AS hourlies
GROUP BY macaddr, hourbase

To make things easier for yourself you should create an "hour" function, that returns a datetime with no significant figures after the hour part. So right now (2/2/2012 5:05pm) would be 2012-02-07 17:00. Here's the code for your hour function:

select dateadd(hh, DATEPART(hh, current_timestamp), DATEADD(dd, 0, datediff(dd, 0, current_timestamp)))

(replace current_timestamp in the above code with the datetime parameter of your hour function. I'll assume you created it as dbo.fnHour(), and it takes a datetime parameter.

You can then use the dbo.fnHour() as a partitioning function to query what you want. Your sql will look something like this:

select 
    avg(load) as avg_load
from (
    select dbo.fnHour(si.timestamp) [hour], macaddr, sum(load) as [load]
    from 
        sensor_data si 
    where 
        si.timestamp >= dateadd(mm, -1, current_timestamp)
    group by 
        dbo.fnHour(si.timestamp), macaddr
) as f

I haven't tested it so there may be some typos but this should be enough to get you going.

I may be misunderstanding what you are trying to do. It looks like you are making things a lot more complicated than they need to be using the sampling. Perhaps giving samples of what the result should look like would allow people to provide better solutions for your particular case.

mysql> SELECT * FROM `test`;
+----+-----+------+------------+
| id | mac | load | when       |
+----+-----+------+------------+
|  1 |   1 |   10 | 2012-02-01 |
|  2 |   1 |   20 | 2012-01-01 |
|  3 |   2 |   60 | 2011-09-01 |
+----+-----+------+------------+

mysql> SELECT avg(`sum_load`)
    -> FROM 
    -> (
    ->    SELECT sum( `load` ) as sum_load
    ->    FROM `test`
    ->    WHERE `when` > '2011-01-15'
    ->    GROUP BY `mac`
    -> ) as t1;
+-----------------+
| avg(`sum_load`) |
+-----------------+
|         45.0000 |
+-----------------+

mysql> SELECT avg(`sum_load`)
    -> FROM 
    -> (
    ->    SELECT sum( `load` ) as sum_load
    ->    FROM `test`
    ->    WHERE `when` > '2011-01-15' AND `when` < '2012-01-15'
    ->    GROUP BY `mac`
    -> ) as t1;
+-----------------+
| avg(`sum_load`) |
+-----------------+
|         40.0000 |
+-----------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM