I would like have a table with a start_date, end_date, campaign, group, and sum(value) for every possible combination of date ranges.
I tried a cross join where t1.campaign = t2.campaign AND t1.group = t2.group AND d1.date <= d2.date
but couldn't figure out, how to get the correct sum of values for those date ranges? Maybe a cross join with some kind of lag function? Not sure.
Any help would be much appreciated.
Current Redshift table
date campaign group value
1/1/2019 campaign_1 control 5
1/2/2019 campaign_1 control 7
1/4/2019 campaign_1 control 8
1/5/2019 campaign_1 control 14
1/7/2019 campaign_1 control 11
Desired Redshift table
start_date end_date campaign group value
1/1/19 1/1/19 campaign_1 control 5
1/1/19 1/2/19 campaign_1 control 12
1/1/19 1/3/19 campaign_1 control 12
1/1/19 1/4/19 campaign_1 control 20
1/1/19 1/5/19 campaign_1 control 34
1/1/19 1/6/19 campaign_1 control 34
1/1/19 1/7/19 campaign_1 control 45
1/2/19 1/2/19 campaign_1 control 7
1/2/19 1/3/19 campaign_1 control 7
1/2/19 1/4/19 campaign_1 control 15
1/2/19 1/5/19 campaign_1 control 29
1/2/19 1/6/19 campaign_1 control 29
1/2/19 1/7/19 campaign_1 control 40
1/3/19 1/3/19 campaign_1 control 0
1/3/19 1/4/19 campaign_1 control 8
1/3/19 1/5/19 campaign_1 control 22
1/3/19 1/6/19 campaign_1 control 22
1/3/19 1/7/19 campaign_1 control 33
1/4/19 1/4/19 campaign_1 control 8
1/4/19 1/5/19 campaign_1 control 22
1/4/19 1/6/19 campaign_1 control 22
1/4/19 1/7/19 campaign_1 control 33
You are going to need to combine a calendar table containing all possible dates in the range, a sub-query, and a window function in order to generate a table in the format you desire. The general steps I would take are:
The following query yields the output that you specify, given the base table that you have defined (I have changed some of the column names to avoid keyword conflicts):
-- Generating a base calendar table from cartesian integer joins
WITH month_numerals AS (
SELECT 1 AS month_numeral, 31 AS month_end_numeral
UNION SELECT 2, 28
UNION SELECT 3, 31
UNION SELECT 4, 30
UNION SELECT 5, 31
UNION SELECT 6, 30
UNION SELECT 7, 31
UNION SELECT 8, 31
UNION SELECT 9, 30
UNION SELECT 10, 31
UNION SELECT 11, 30
UNION SELECT 12, 31
),
years AS (
SELECT 2018 AS year_numeral
UNION SELECT 2019
UNION SELECT 2020
),
days AS (
SELECT 1 AS day_numeral
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
UNION SELECT 6
UNION SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 10
UNION SELECT 11
UNION SELECT 12
UNION SELECT 13
UNION SELECT 14
UNION SELECT 15
UNION SELECT 16
UNION SELECT 17
UNION SELECT 18
UNION SELECT 19
UNION SELECT 20
UNION SELECT 21
UNION SELECT 22
UNION SELECT 23
UNION SELECT 24
UNION SELECT 25
UNION SELECT 26
UNION SELECT 27
UNION SELECT 28
UNION SELECT 29
UNION SELECT 30
UNION SELECT 31
),
base_calendar_numerals AS (
SELECT year_numeral
, month_numeral
, day_numeral
-- Accounting for leap years (you may want to double check this logic is producing correct values)
, CASE WHEN ((MOD(year_numeral, 4) = 0 AND MOD(year_numeral, 100) != 0)
OR MOD(year_numeral, 400) = 0) AND month_numeral = 2 THEN 29
ELSE month_end_numeral
END AS month_end_number_calculated
FROM years
CROSS JOIN month_numerals
CROSS JOIN days
WHERE day_numeral <= month_end_number_calculated
),
base_calendar AS (
-- Using the DATE() function to generate date objects from integer concatenations
SELECT DATE(to_number(to_char(year_numeral, 'FM0000')
|| to_char(month_numeral, 'FM00')
|| to_char(day_numeral, 'FM00'), '00000000')) AS base_date
FROM base_calendar_numerals
),
end_dates AS (
SELECT campaign
, group_name
, MAX(date) AS end_date
FROM table_name
GROUP BY 1, 2
),
pivoted_base_data AS (
SELECT n.campaign
, n.group_name
, date AS start_date
, end_date
FROM table_name n
JOIN end_dates e ON (n.campaign = e.campaign AND n.group_name = e.group_name)
GROUP BY 1, 2, 3, 4
),
transposed_base_data AS (
SELECT campaign
, group_name
, start_date
, base_date AS end_date
FROM pivoted_base_data p
JOIN base_calendar c ON c.base_date BETWEEN p.start_date AND p.end_date
GROUP BY 1, 2, 3, 4
),
raw_result AS (
SELECT start_date
, end_date
, t.campaign
, t.group_name
, COALESCE(value_amt, 0) AS value_amt
FROM transposed_base_data t
LEFT JOIN table_name n ON (t.campaign = n.campaign
AND t.group_name = n.group_name
AND n.date = t.end_date)
GROUP BY 1, 2, 3, 4, 5
)
SELECT start_date
, end_date
, campaign
, group_name
, SUM(value_amt) OVER (PARTITION BY campaign, group_name, start_date
ORDER BY start_date, end_date
ROWS UNBOUNDED PRECEDING) AS value_amt
FROM raw_result
ORDER BY start_date, campaign, group_name
;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.