[英]SQL averages per row from multiple columns and nulls
I have an app that logs data for sensors and I want to be able to produce averages from multiple sensors, could be one, two, three or plenty... 我有一个记录传感器数据的应用程序,我希望能够从多个传感器产生平均值,可以是一个,两个,三个或很多......
EDIT: These are temperature sensors so 0 is a value that the sensors might store as a value in the database. 编辑:这些是温度传感器,因此0是传感器可能存储为数据库中的值的值。
My initial starting point was this SQL query: 我的初始起点是这个SQL查询:
SELECT grid.t5||'.000000' as ts,
avg(t.sensorvalue) sensorvalue1
, avg(w.sensorvalue)AS sensorvalue2
FROM
(SELECT generate_series(min(date_trunc('hour', ts))
,max(ts), interval '5 min') AS t5 FROM device_history_20865735 where
ts between '2015/05/13 09:00' and '2015/05/14 09:00' ) grid
LEFT JOIN device_history_20865735 t ON t.ts >= grid.t5 AND t.ts < grid.t5 + interval '5 min'
LEFT JOIN device_history_493417852 w ON w.ts >= grid.t5 AND w.ts < grid.t5 + interval '5 min'
--WHERE t.sensorvalue notnull
GROUP BY grid.t5 ORDER BY grid.t5
I get 5 min averages as it is better for my app. 我得到5分钟的平均值,因为它对我的应用程序更好。
The results as expected have NULL values for either sensorvalue1 or 2: 对于sensorvalue1或2,预期的结果具有NULL值:
ts;sensorvalue1;sensorvalue2
"2015-05-13 09:00:00.000000";19.9300003051758;
"2015-05-13 09:05:00.000000";20;
"2015-05-13 09:10:00.000000";;
"2015-05-13 09:15:00.000000";20.0599994659424;
"2015-05-13 09:20:00.000000";;
"2015-05-13 09:25:00.000000";20.1200008392334;
My aim is to calculate an average for each 5 min interval from all the available sensors so as NULLs are a problem I thought of using a CASE statement so if there is a NULL to get the value of the other sensor... 我的目标是计算所有可用传感器每5分钟间隔的平均值,因为NULL是一个问题我想到使用CASE语句,所以如果有一个NULL来获取另一个传感器的值...
SELECT grid.t5||'.000000' as ts,
CASE
WHEN avg(t.sensorvalue) ISNULL THEN avg(w.sensorvalue)
ELSE avg(t.sensorvalue)
END AS sensorvalue
,
CASE
WHEN avg(w.sensorvalue) ISNULL THEN avg(t.sensorvalue)
ELSE avg(w.sensorvalue)
END AS sensorvalue2
FROM
(SELECT generate_series(min(date_trunc('hour', ts)),max(ts), interval '5 min') AS t5
FROM device_history_20865735 where
ts between '2015/05/13 09:00' and '2015/05/14 09:00' ) grid
LEFT JOIN device_history_20865735 t ON t.ts >= grid.t5 AND t.ts < grid.t5 + interval '5 min'
LEFT JOIN device_history_493417852 w ON w.ts >= grid.t5 AND w.ts < grid.t5 + interval '5 min'
GROUP BY grid.t5 ORDER BY grid.t5
but then to calculate the average I have to do another select on top of this and devide per number of columns (aka sensors) and if they are just two it is OK but if there are 3 or 4 sensors this can get very messy as there could be multiple sensors with NULL values per row... 但是然后计算平均值我必须在此基础上做另一个选择并分配每列数(也就是传感器),如果它们只是两个就可以了但是如果有3个或4个传感器这会变得非常混乱可能是多行传感器,每行有NULL值......
The SQL is derived grammatically from an app (using Python) using postgres 9.4 so is there a simple way to achieve what is needed as I feel I'm down a rather complex route...? SQL是使用postgres 9.4从应用程序(使用Python)语法派生的,所以有一种简单的方法来实现所需的东西,因为我觉得我走的是一条相当复杂的路线......?
EDIT #2: With your input I've produce this SQL code, again it seems rather complex but open to your ideas and scrutiny if it is reliable and maintainable: 编辑#2:根据您的输入我已经生成了这个SQL代码,再次看起来相当复杂,但如果它是可靠和可维护的,请接受您的想法和审查:
SELECT ts, sensortotal, sensorcount,
CASE
WHEN sensorcount = 0 THEN -1000
ELSE sensortotal/sensorcount
END AS sensorAvg
FROM (
WITH grid as (
SELECT t5
FROM (SELECT generate_series(min(date_trunc('hour', ts)), max(ts), interval '5 min') as t5
FROM device_history_20865735
) d
WHERE t5 between '2015-05-13 09:00' and '2015-05-14 09:00'
)
SELECT d1.t5 || '.000000' as ts
, Coalesce(avg(d1.sensorvalue), 0) + Coalesce(avg(d2.sensorvalue),0) as sensorTotal
, (CASE
WHEN avg(d1.sensorvalue) ISNULL THEN 0
ELSE 1
END + CASE
WHEN avg(d2.sensorvalue) ISNULL THEN 0
ELSE 1
END) as sensorCount
FROM (SELECT grid.t5, avg(t.sensorvalue) as sensorvalue
FROM grid LEFT JOIN
device_history_20865735 t
ON t.ts >= grid.t5 AND t.ts <grid.t5 + interval '5 min'
GROUP BY grid.t5
) d1 LEFT JOIN
(SELECT grid.t5, avg(t.sensorvalue) as sensorvalue
FROM grid LEFT JOIN
device_history_493417852 t
ON t.ts >= grid.t5 AND t.ts <grid.t5 + interval '5 min'
GROUP BY grid.t5
) d2 on d1.t5 = d2.t5
GROUP BY d1.t5
ORDER BY d1.t5
) tmp;
Thanks! 谢谢!
To get accurate averages, you need to calculate each one separately before the join: 要获得准确的平均值,您需要在连接之前单独计算每个平均值:
WITH grid as (
SELECT t5
FROM (SELECT generate_series(min(date_trunc('hour', ts)), max(ts), interval '5 min') as t5
FROM device_history_20865735
) d
WHERE t5 between '2015-05-13 09:00' and '2015-05-14 09:00'
)
SELECT d1.t5 || '.000000' as ts,
avg(d1.sensorvalue) as sensorvalue1
, avg(d2.sensorvalue) as sensorvalue2
FROM (SELECT grid.t5, avg(t.sensorvalue) as sensorvalue
FROM grid LEFT JOIN
device_history_20865735 t
ON t.ts >= grid.t5 AND t.ts <grid.t5 + interval '5 min'
GROUP BY grid.t5
) d1 LEFT JOIN
(SELECT grid.t5, avg(t.sensorvalue) as sensorvalue
FROM grid LEFT JOIN
device_history_493417852 t
ON t.ts >= grid.t5 AND t.ts <grid.t5 + interval '5 min'
GROUP BY grid.t5
) d2 on d1.t5 = d2.t5
GROUP BY d1.t5
ORDER BY d1.t5;
It sounds like you want to something like this: 听起来你想要这样的东西:
(coalesce(value1,0) + coalesce(value2,0) + coalesce(value3,0)) /
(value1 IS NOT NULL::int + value2 IS NOT NULL::int + value3 IS NOT NULL::int)
AS average
Basically, just do the math you want to do for each row. 基本上,只需要为每一行做数学运算。 The only "tricky" part is how to "count" the non-null values--I used a cast, but there are other options such as:
唯一“棘手”的部分是如何“计算”非空值 - 我使用了一个演员,但还有其他选项,如:
CASE WHEN value1 IS NULL THEN 0 ELSE 1 END
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.