[英]How do I effectively select the average sum of several sums being calculated based on different timestamps in SQL?
我有一個數據庫表,如下所示:
id | macaddr | load | timestamp
=========================================
1 | 0011111 | 17 | 2012-02-07 10:00:00
1 | 0011111 | 6 | 2012-02-07 12:00:00
2 | 0022222 | 3 | 2012-02-07 12:00:03
3 | 0033333 | 9 | 2012-02-07 12:00:04
4 | 0022222 | 4 | 2012-02-07 12:00:06
5 | 0033333 | 8 | 2012-02-07 12:00:10
...
現在,我想計算不同時間段(例如今天,昨天,本周,本月)所有設備(= mac地址)的平均負載。
通過首先找出不同時間點(樣本日期)的總負荷總和,然后計算這些樣本日期的負荷總和的平均值,可以計算平均負荷。 例如,如果我希望最近十秒鍾的平均負載(現在是2012-02-07 12:00:10),則可以將采樣日期確定為12:00:02、12:00: 04、12:00:06、12:00:08和12:00:10。 然后,我將通過匯總每個設備的最新負載值來計算負載總和:
2012-02-07 12:00:02 | 6 [= load(id=2)]
2012-02-07 12:00:04 | 18 [= load(id=2) + load(id=3) + load(id=4)]
2012-02-07 12:00:06 | 19 [= load(id=2) + load(id=4) + load(id=5)]
2012-02-07 12:00:08 | 19 [= load(id=2) + load(id=4) + load(id=5)]
2012-02-07 12:00:10 | 18 [= load(id=2) + load(id=5) + load(id=6)]
如果設備的負載值早於一個小時(此處為id = 1),則該負載值將被忽略。 在這種情況下,平均值為16。
當前,我使用許多“ UNION ALL”語句生成了一個相當復雜的查詢,該語句非常慢:
SELECT avg(l.load_sum) as avg_load
FROM (
SELECT sum(so.load) AS load_sum
FROM (
SELECT *
FROM (
SELECT si.macaddr, si.load
FROM sensor_data si WHERE si.timestamp > '2012-02-07 11:00:10' AND si.timestamp < '2012-02-07 12:00:10'
ORDER BY si.timestamp DESC
) AS sm
GROUP BY macaddr
) so
UNION ALL
[THE SAME THING AGAIN WITH OTHER TIMESTAMPS]
UNION ALL
[AND AGAIN]
UNION ALL
[AND AGAIN]
...
) l
現在想象一下,我想計算一個月的平均負載。 對於每小時的采樣日期,我需要使用UNION ALL語句加入30x24 = 720個查詢。 整個查詢需要將近一分鍾才能在我的計算機上完成。 我相信沒有UNION ALL語句會有更好的解決方案。 但是,我在網絡上找不到任何有用的東西。 因此,非常感謝您的幫助!
使用Unix時間戳的一小部分:首先,我們計算每小時(3600秒)的平均值:
SELECT
macaddr,
sum(CAST(load AS float))/CAST(count(*) AS float) AS loadavg,
FLOOR(UNIX_TIMESTAMP(`timestamp`)/3600) AS hourbase
FROM sensor_data
GROUP BY macaddr,FLOOR(UNIX_TIMESTAMP(`timestamp`)/3600)
然后我們平均一個月
SELECT
avg(loadavg) as monthlyavg,
macaddr
FROM (
SELECT
macaddr,
sum(CAST(load AS float))/CAST(count(*) AS float) AS loadavg,
FLOOR(UNIX_TIMESTAMP(`timestamp`)/3600) AS hourbase
FROM sensor_data
WHERE `timestamp` BETWEEN '2012-01-07 12:00:00' AND '2012-02-07 11:59:59'
GROUP BY macaddr,FLOOR(UNIX_TIMESTAMP(`timestamp`)/3600)
) AS hourlies
GROUP BY macaddr, hourbase
為了使事情變得更容易,您應該創建一個“小時”函數,該函數返回一個日期時間,小時部分之后沒有任何有效數字。 因此,現在(2012年2月2日下午5:05)將是2012-02-07 17:00。 這是您的小時函數的代碼:
select dateadd(hh, DATEPART(hh, current_timestamp), DATEADD(dd, 0, datediff(dd, 0, current_timestamp)))
(將上述代碼中的current_timestamp
替換為小時函數的datetime參數。我假設您將其創建為dbo.fnHour(),並且它帶有datetime參數。
然后,您可以使用dbo.fnHour()作為分區函數來查詢所需的內容。 您的sql看起來像這樣:
select
avg(load) as avg_load
from (
select dbo.fnHour(si.timestamp) [hour], macaddr, sum(load) as [load]
from
sensor_data si
where
si.timestamp >= dateadd(mm, -1, current_timestamp)
group by
dbo.fnHour(si.timestamp), macaddr
) as f
我沒有測試過,所以可能會有一些錯別字,但這足以讓您前進。
我可能會誤解您想要做什么。 看起來您正在使事情變得比使用采樣要復雜得多。 給出結果看起來應該是什么樣的樣本,也許可以使人們為您的特定案例提供更好的解決方案。
mysql> SELECT * FROM `test`;
+----+-----+------+------------+
| id | mac | load | when |
+----+-----+------+------------+
| 1 | 1 | 10 | 2012-02-01 |
| 2 | 1 | 20 | 2012-01-01 |
| 3 | 2 | 60 | 2011-09-01 |
+----+-----+------+------------+
mysql> SELECT avg(`sum_load`)
-> FROM
-> (
-> SELECT sum( `load` ) as sum_load
-> FROM `test`
-> WHERE `when` > '2011-01-15'
-> GROUP BY `mac`
-> ) as t1;
+-----------------+
| avg(`sum_load`) |
+-----------------+
| 45.0000 |
+-----------------+
mysql> SELECT avg(`sum_load`)
-> FROM
-> (
-> SELECT sum( `load` ) as sum_load
-> FROM `test`
-> WHERE `when` > '2011-01-15' AND `when` < '2012-01-15'
-> GROUP BY `mac`
-> ) as t1;
+-----------------+
| avg(`sum_load`) |
+-----------------+
| 40.0000 |
+-----------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.