[英]Sum different date intervals year over year
我有许多商店想要总结今年到目前为止与去年同期相比的能源消耗。 我的挑战是,在本年度,商店在交付数据方面的日期间隔不同。 这意味着商店A可能具有2018年1月1日至2018年2月21日之间的数据,商店B可能具有2018年1月1日至2018年1月28日之间的数据。 我想总结一下本年度与上一年相同的日期间隔。
数据看起来像这样
Store Date Sum
A 01.01.2018 12
A 20.01.2018 11
B 01.01.2018 33
B 28.01.2018 32
但是数以百万计的行将使用这些日期作为参考,以获取上一年的总和。
这是我的(错误)尝试:
SET @curryear = (SELECT YEAR(MAX(start_date)) FROM energy_data);
SET @maxdate_curryear = (SELECT MAX(start_date) FROM energy_data WHERE
YEAR(start_date) = @curryear);
SET @mindate_curryear = (SELECT MIN(start_date) FROM energy_data WHERE
YEAR(start_date) = @curryear);
-- the same date intervals last year
SET @maxdate_prevyear = (@maxdate_curryear - INTERVAL 1 YEAR);
SET @mindate_prevyear = (@mindate_curryear - INTERVAL 1 YEAR);
-- sums current year
CREATE TABLE t_sum_curr AS
SELECT name as name_curr, sum(kwh) as sum_curr, min(start_date) AS
min_date_curr, max(start_date) AS max_date_curr, count(distinct
start_date) AS ant_timer FROM energy_data WHERE agg_type = 'timesnivå'
AND start_date >= @mindate_curryear and start_date <= @maxdate_curryear GROUP BY NAME;
-- also seems fair, the same dates one year ago, figured I should find those first and in the next query use that to sum each stores between those date intervals
CREATE TABLE t_sum_prev AS
SELECT name_curr as name_curr2, (min_date_curr - INTERVAL 1 YEAR) AS
min_date_prev, (max_date_curr - INTERVAL 1 YEAR) as max_date_prev FROM
t_sum_curr;
-- getting into trouble!
CREATE TABLE the_results AS
SELECT name, start_date, sum(kwh) as sum_prev from energy_data where
agg_type = 'timesnivå' and
start_date >= @mindate_prevyear and start_date <=
@maxdate_prevyear group by name having start_date BETWEEN (SELECT
min_date_prev from t_sum_prev) AND
(SELECT max_date_prev from t_sum_prev);
`最后一个查询只是告诉我,我的子查询返回多于1行并抛出错误消息。
我假设您所拥有的是能源消耗数据列表,其中的账单或读数是在不定期的时间进行的,因此能耗涵盖了不定期的时间段。
您需要采取的基本方法是调整消耗时段-通过确定每个时段所覆盖的天数,然后将每个读数细分为所覆盖的天数,每天的消耗量是该时段的每日平均值。
我假设消耗期完全是连续的(就像账单或正常阅读一样),并且没有重叠。
由于涉及的行数很大(即使以当前格式,您也说成百万行),因此您可能不希望将数据保留为每日形式-可能需要将其重新分组为常规的每周,每月或每季度,具体取决于级别您需要比较的粒度。
一旦有了自己的定期工作,比较就会像蛋糕一样容易。
如果这是将持续运行的报告的一部分,则您可能希望实现一些逻辑,该逻辑可按计划递增地计算“常规消费”,并将其存储在汇总表中,并带有适当的列和索引,这样您就不必在每次运行报告时都处理数百万条历史行。
尝试使用奇妙的联接和动态平均数来解决不规则时期(如果确实可以做到),而不是直接解决这些问题,可能会导致非常困难的逻辑,尤其是在这种规模的数据集上,可怕的表现。
编辑:从下面的评论。
@Alexander,我整理了一个查询示例。 我还没有测试它,而是全部在文本编辑器中编写的,所以请原谅任何小的语法错误。 我想出的内容似乎有点复杂(比我刚开始时想象的要复杂),但是我也有点累,所以我不确定是否可以进一步简化它。
我要说的唯一一点是,由于该查询(或任何此类查询)在遍历日期范围中必须执行的操作的性质,因此它的性能可能会吓到具有数百万行的表。 我坚持我先前的言论,对源数据进行正确的索引至关重要,并且将源数据汇总为更大的粒度将极大地提高性能(以一次性将其汇总为代价)。 即使是每天的粒度,也将使行数减少24倍!
WITH energy_data_ext AS
(
SELECT
ed.name AS store_name
,YEAR(ed.start_date) AS reading_year
,ed.start_date AS reading_date
,ed.kwh AS reading_kwh
FROM
energy_data AS ed
)
,available_stores AS
(
SELECT ede.store_name
FROM energy_data_ext AS ede
GROUP BY ede.store_name
)
,current_reading_yr_per_store AS
(
SELECT
ede.store_name
,MAX(ede.reading_year) AS current_reading_year
FROM
energy_data_ext AS ede
GROUP BY
ede.store_name
)
,latest_reading_ranges_per_year AS
(
SELECT
ede.store_name
,ede.reading_year
,MAX(ede.start_date) AS latest_reading_date_of_yr
FROM
energy_data_ext AS ede
GROUP BY
ede.store_name
,ede.reading_year
)
,store_reading_ranges AS
(
SELECT
avs.store_name
,lryps.current_reading_year
,lyrr.latest_reading_date_of_yr AS current_year_latest_reading_date
,(lryps.current_reading_year - 1) AS prev_reading_year
,(lyrr.latest_reading_date_of_yr - INTERVAL 1 YEAR) AS prev_year_latest_reading_date
FROM
available_stores AS avs
LEFT JOIN
current_reading_yr_per_store AS lryps
ON (lryps.store_name = avs.store_name)
LEFT JOIN
latest_reading_ranges_per_year AS lyrr
ON (lyrr.store_name = avs.store_name)
AND (lyrr.reading_year = lryps.current_reading_year)
)
--at this stage, we should have all the calculations we need to
--establish the range for the latest year, and the range for the year prior to that
,current_year_consumption AS
(
SELECT
avs.store_name
SUM(cyed.reading_kwh) AS latest_year_kwh
FROM
available_stores AS avs
LEFT JOIN
store_reading_ranges AS srs
ON (srs.store_name = avs.store_name)
LEFT JOIN
energy_data_ext AS cyed
ON (cyed.reading_year = srs.current_reading_year)
AND (cyed.reading_date <= srs.current_year_latest_reading_date)
GROUP BY
avs.store_name
)
,prev_year_consumption AS
(
SELECT
avs.store_name
SUM(pyed.reading_kwh) AS prev_year_kwh
FROM
available_stores AS avs
LEFT JOIN
store_reading_ranges AS srs
ON (srs.store_name = avs.store_name)
LEFT JOIN
energy_data_ext AS pyed
ON (pyed.reading_year = srs.prev_reading_year)
AND (pyed.reading_date <= srs.prev_year_latest_reading_date)
GROUP BY
avs.store_name
)
SELECT
avs.store_name
,srs.current_reading_year
,srs.current_year_latest_reading_date
,lyc.latest_year_kwh
,srs.prev_reading_year
,srs.prev_year_latest_reading_date
,pyc.prev_year_kwh
FROM
available_stores AS avs
LEFT JOIN
store_reading_ranges AS srs
ON (srs.store_name = avs.store_name)
LEFT JOIN
current_year_consumption AS lyc
ON (lyc.store_name = avs.store_name)
LEFT JOIN
prev_year_consumption AS pyc
ON (pyc.store_name = avs.store_name)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.