简体   繁体   中英

Query every possible date combination in a table and sum values

I would like have a table with a start_date, end_date, campaign, group, and sum(value) for every possible combination of date ranges.

I tried a cross join where t1.campaign = t2.campaign AND t1.group = t2.group AND d1.date <= d2.date but couldn't figure out, how to get the correct sum of values for those date ranges? Maybe a cross join with some kind of lag function? Not sure.
Any help would be much appreciated.

Current Redshift table

date         campaign      group      value
1/1/2019     campaign_1    control    5
1/2/2019     campaign_1    control    7 
1/4/2019     campaign_1    control    8
1/5/2019     campaign_1    control    14
1/7/2019     campaign_1    control    11

Desired Redshift table

start_date  end_date    campaign    group   value
1/1/19      1/1/19      campaign_1  control 5
1/1/19      1/2/19      campaign_1  control 12
1/1/19      1/3/19      campaign_1  control 12
1/1/19      1/4/19      campaign_1  control 20
1/1/19      1/5/19      campaign_1  control 34
1/1/19      1/6/19      campaign_1  control 34
1/1/19      1/7/19      campaign_1  control 45
1/2/19      1/2/19      campaign_1  control 7
1/2/19      1/3/19      campaign_1  control 7
1/2/19      1/4/19      campaign_1  control 15
1/2/19      1/5/19      campaign_1  control 29
1/2/19      1/6/19      campaign_1  control 29
1/2/19      1/7/19      campaign_1  control 40
1/3/19      1/3/19      campaign_1  control 0
1/3/19      1/4/19      campaign_1  control 8
1/3/19      1/5/19      campaign_1  control 22
1/3/19      1/6/19      campaign_1  control 22
1/3/19      1/7/19      campaign_1  control 33
1/4/19      1/4/19      campaign_1  control 8
1/4/19      1/5/19      campaign_1  control 22
1/4/19      1/6/19      campaign_1  control 22
1/4/19      1/7/19      campaign_1  control 33

You are going to need to combine a calendar table containing all possible dates in the range, a sub-query, and a window function in order to generate a table in the format you desire. The general steps I would take are:

  1. Pivot the data set into a start_date , end_date form without value included.
  2. Join value back into the pivoted data set
  3. Perform aggregation via window function.

The following query yields the output that you specify, given the base table that you have defined (I have changed some of the column names to avoid keyword conflicts):

-- Generating a base calendar table from cartesian integer joins
WITH month_numerals AS (
    SELECT 1 AS month_numeral, 31 AS month_end_numeral
    UNION SELECT 2, 28
    UNION SELECT 3, 31
    UNION SELECT 4, 30
    UNION SELECT 5, 31
    UNION SELECT 6, 30
    UNION SELECT 7, 31
    UNION SELECT 8, 31
    UNION SELECT 9, 30
    UNION SELECT 10, 31
    UNION SELECT 11, 30
    UNION SELECT 12, 31
),

years AS (
    SELECT 2018 AS year_numeral
    UNION SELECT 2019
    UNION SELECT 2020
),

days AS (
    SELECT 1 AS day_numeral
    UNION SELECT 2
    UNION SELECT 3
    UNION SELECT 4
    UNION SELECT 5
    UNION SELECT 6
    UNION SELECT 7
    UNION SELECT 8
    UNION SELECT 9
    UNION SELECT 10
    UNION SELECT 11
    UNION SELECT 12
    UNION SELECT 13
    UNION SELECT 14
    UNION SELECT 15
    UNION SELECT 16
    UNION SELECT 17
    UNION SELECT 18
    UNION SELECT 19
    UNION SELECT 20
    UNION SELECT 21
    UNION SELECT 22
    UNION SELECT 23
    UNION SELECT 24
    UNION SELECT 25
    UNION SELECT 26
    UNION SELECT 27
    UNION SELECT 28
    UNION SELECT 29
    UNION SELECT 30
    UNION SELECT 31
),

base_calendar_numerals AS (
    SELECT year_numeral
         , month_numeral
         , day_numeral
         -- Accounting for leap years (you may want to double check this logic is producing correct values)
         , CASE WHEN ((MOD(year_numeral, 4) = 0 AND MOD(year_numeral, 100) != 0)
                    OR MOD(year_numeral, 400) = 0) AND month_numeral = 2 THEN 29
                ELSE month_end_numeral
           END AS month_end_number_calculated
    FROM years
        CROSS JOIN month_numerals
        CROSS JOIN days
    WHERE day_numeral <= month_end_number_calculated
),

base_calendar AS (
    -- Using the DATE() function to generate date objects from integer concatenations
    SELECT DATE(to_number(to_char(year_numeral, 'FM0000')
        || to_char(month_numeral, 'FM00')
        || to_char(day_numeral, 'FM00'), '00000000')) AS base_date
    FROM base_calendar_numerals
),

end_dates AS (
    SELECT campaign
         , group_name
         , MAX(date) AS end_date
    FROM table_name
    GROUP BY 1, 2
),

pivoted_base_data AS (
    SELECT n.campaign
         , n.group_name
         , date AS start_date
         , end_date
    FROM table_name n
        JOIN end_dates e ON (n.campaign = e.campaign AND n.group_name = e.group_name)
    GROUP BY 1, 2, 3, 4
),

transposed_base_data AS (
    SELECT campaign
         , group_name
         , start_date
         , base_date AS end_date
    FROM pivoted_base_data p
        JOIN base_calendar c ON c.base_date BETWEEN p.start_date AND p.end_date
    GROUP BY 1, 2, 3, 4
),

raw_result AS (
    SELECT start_date
         , end_date
         , t.campaign
         , t.group_name
         , COALESCE(value_amt, 0) AS value_amt
    FROM transposed_base_data t
        LEFT JOIN table_name n ON (t.campaign = n.campaign
                                  AND t.group_name = n.group_name
                                  AND n.date = t.end_date)
    GROUP BY 1, 2, 3, 4, 5
)

SELECT start_date
     , end_date
     , campaign
     , group_name
     , SUM(value_amt) OVER (PARTITION BY campaign, group_name, start_date
                        ORDER BY start_date, end_date
                        ROWS UNBOUNDED PRECEDING) AS value_amt
FROM raw_result
ORDER BY start_date, campaign, group_name
;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM