繁体   English   中英

逐年求和不同的日期间隔

[英]Sum different date intervals year over year

我有许多商店想要总结今年到目前为止与去年同期相比的能源消耗。 我的挑战是,在本年度,商店在交付数据方面的日期间隔不同。 这意味着商店A可能具有2018年1月1日至2018年2月21日之间的数据,商店B可能具有2018年1月1日至2018年1月28日之间的数据。 我想总结一下本年度与上一年相同的日期间隔。

数据看起来像这样

Store   Date    Sum
A   01.01.2018  12
A   20.01.2018  11
B   01.01.2018  33
B   28.01.2018  32

但是数以百万计的行将使用这些日期作为参考,以获取上一年的总和。

这是我的(错误)尝试:

SET @curryear = (SELECT YEAR(MAX(start_date)) FROM energy_data);
SET @maxdate_curryear = (SELECT MAX(start_date) FROM energy_data WHERE 
YEAR(start_date) = @curryear);
SET @mindate_curryear = (SELECT MIN(start_date) FROM energy_data WHERE 
YEAR(start_date) = @curryear);

-- the same date intervals last year

SET @maxdate_prevyear = (@maxdate_curryear - INTERVAL 1 YEAR); 
SET @mindate_prevyear = (@mindate_curryear - INTERVAL 1 YEAR); 

-- sums current year

CREATE TABLE t_sum_curr AS
SELECT name as name_curr, sum(kwh) as sum_curr, min(start_date) AS 
min_date_curr, max(start_date) AS max_date_curr, count(distinct 
start_date) AS ant_timer FROM energy_data WHERE agg_type = 'timesnivå' 
AND start_date >= @mindate_curryear and start_date <= @maxdate_curryear GROUP BY NAME; 

-- also seems fair, the same dates one year ago, figured I should find those first and in the next query use that to sum each stores between those date intervals

CREATE TABLE t_sum_prev AS
SELECT name_curr as name_curr2, (min_date_curr - INTERVAL 1 YEAR) AS 
min_date_prev, (max_date_curr - INTERVAL 1 YEAR) as max_date_prev FROM 
t_sum_curr;

-- getting into trouble!

CREATE TABLE the_results AS
SELECT name, start_date, sum(kwh) as sum_prev from energy_data where 
agg_type = 'timesnivå' and
            start_date >= @mindate_prevyear and start_date <= 
@maxdate_prevyear group by name having start_date BETWEEN (SELECT 
min_date_prev from t_sum_prev) AND                                                                      
(SELECT max_date_prev from t_sum_prev);

`最后一个查询只是告诉我,我的子查询返回多于1行并抛出错误消息。

我假设您所拥有的是能源消耗数据列表,其中的账单或读数是在不定期的时间进行的,因此能耗涵盖了不定期的时间段。

您需要采取的基本方法是调整消耗时段-通过确定每个时段所覆盖的天数,然后将每个读数细分为所覆盖的天数,每天的消耗量是该时段的每日平均值。

我假设消耗期完全是连续的(就像账单或正常阅读一样),并且没有重叠。

由于涉及的行数很大(即使以当前格式,您也说成百万行),因此您可能不希望将数据保留为每日形式-可能需要将其重新分组为常规的每周,每月或每季度,具体取决于级别您需要比较的粒度。

一旦有了自己的定期工作,比较就会像蛋糕一样容易。

如果这是将持续运行的报告的一部分,则您可能希望实现一些逻辑,该逻辑可按计划递增地计算“常规消费”,并将其存储在汇总表中,并带有适当的列和索引,这样您就不必在每次运行报告时都处理数百万条历史行。

尝试使用奇妙的联接和动态平均数来解决不规则时期(如果确实可以做到),而不是直接解决这些问题,可能会导致非常困难的逻辑,尤其是在这种规模的数据集上,可怕的表现。

编辑:从下面的评论。

@Alexander,我整理了一个查询示例。 我还没有测试它,而是全部在文本编辑器中编写的,所以请原谅任何小的语法错误。 我想出的内容似乎有点复杂(比我刚开始时想象的要复杂),但是我也有点累,所以我不确定是否可以进一步简化它。

我要说的唯一一点是,由于该查询(或任何此类查询)在遍历日期范围中必须执行的操作的性质,因此它的性能可能会吓到具有数百万行的表。 我坚持我先前的言论,对源数据进行正确的索引至关重要,并且将源数据汇总为更大的粒度将极大地提高性能(以一次性将其汇总为代价)。 即使是每天的粒度,也将使行数减少24倍!

WITH energy_data_ext AS
(
    SELECT
        ed.name                 AS store_name
        ,YEAR(ed.start_date)    AS reading_year
        ,ed.start_date          AS reading_date
        ,ed.kwh                 AS reading_kwh
    FROM
        energy_data AS ed
)

,available_stores AS
(
    SELECT ede.store_name
    FROM energy_data_ext AS ede
    GROUP BY ede.store_name
)

,current_reading_yr_per_store AS
(
    SELECT
        ede.store_name
        ,MAX(ede.reading_year)  AS current_reading_year
    FROM
        energy_data_ext AS ede
    GROUP BY 
        ede.store_name
)

,latest_reading_ranges_per_year AS
(
    SELECT
        ede.store_name
        ,ede.reading_year
        ,MAX(ede.start_date) AS latest_reading_date_of_yr
    FROM
        energy_data_ext AS ede
    GROUP BY
        ede.store_name
        ,ede.reading_year
)

,store_reading_ranges AS
(
    SELECT
        avs.store_name
        ,lryps.current_reading_year
        ,lyrr.latest_reading_date_of_yr AS current_year_latest_reading_date

        ,(lryps.current_reading_year - 1)                   AS prev_reading_year
        ,(lyrr.latest_reading_date_of_yr - INTERVAL 1 YEAR) AS prev_year_latest_reading_date

    FROM
        available_stores AS avs

    LEFT JOIN
        current_reading_yr_per_store AS lryps
        ON (lryps.store_name = avs.store_name)

    LEFT JOIN
        latest_reading_ranges_per_year AS lyrr
        ON (lyrr.store_name = avs.store_name)
        AND (lyrr.reading_year = lryps.current_reading_year)
)

--at this stage, we should have all the calculations we need to 
--establish the range for the latest year, and the range for the year prior to that

,current_year_consumption AS
(
    SELECT
        avs.store_name
        SUM(cyed.reading_kwh) AS latest_year_kwh

    FROM
        available_stores AS avs

    LEFT JOIN
        store_reading_ranges AS srs
        ON (srs.store_name = avs.store_name)

    LEFT JOIN
        energy_data_ext AS cyed
        ON (cyed.reading_year = srs.current_reading_year)
        AND (cyed.reading_date <= srs.current_year_latest_reading_date)

    GROUP BY
        avs.store_name
)

,prev_year_consumption AS
(
    SELECT
        avs.store_name
        SUM(pyed.reading_kwh) AS prev_year_kwh

    FROM
        available_stores AS avs

    LEFT JOIN
        store_reading_ranges AS srs
        ON (srs.store_name = avs.store_name)

    LEFT JOIN
        energy_data_ext AS pyed
        ON (pyed.reading_year = srs.prev_reading_year)
        AND (pyed.reading_date <= srs.prev_year_latest_reading_date)

    GROUP BY
        avs.store_name
)

SELECT
    avs.store_name

    ,srs.current_reading_year
    ,srs.current_year_latest_reading_date
    ,lyc.latest_year_kwh

    ,srs.prev_reading_year
    ,srs.prev_year_latest_reading_date
    ,pyc.prev_year_kwh

FROM
    available_stores AS avs

LEFT JOIN
    store_reading_ranges AS srs
    ON (srs.store_name = avs.store_name)

LEFT JOIN
    current_year_consumption AS lyc
    ON (lyc.store_name = avs.store_name)

LEFT JOIN
    prev_year_consumption AS pyc
    ON (pyc.store_name = avs.store_name)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM