[英]BigQuery SQL - Concatenate two columns if they are on consecutive days
我正在尋找一種方法來調整在 BigQuery 中運行的 sql 查詢,以返回連續兩天甚至三天發生的已發送事件類型的單個計數總數。
SELECT date(EventDate) as EventDate, EventType, count(*) as count FROM `Database.Table`
where date(EventDate) > DATE_SUB (CURRENT_DATE, INTERVAL 100 DAY)
Group by 1,2
ORDER by 1,2
來自上述查詢的響應:
| Row | EventDate | EventType | count |
| ------ | --------- |-----------|-------|
| 1 | 2019-02-06| Sent | 4 |
| 2 | 2019-02-07| Sent | 5 |
| 3 | 2019-02-12| NotSent | 7 |
| 4 | 2019-02-13| Bounces | 22 |
| 5 | 2019-02-14| Bounces | 22 |
| 6 | 2019-03-06| Sent | 2 |
| 7 | 2019-03-07| Sent | 4 |
| 8 | 2019-03-07| NotSent | 5 |
| 9 | 2019-03-12| Bounces | 7 |
| 10 | 2019-03-13| Sent | 22 |
| 11 | 2019-04-05| Sent | 2 |
我想得到的回應:
| Row | EventDate | EventType | count |
| ------ | --------- |-----------|-------|
| 1 | 2019-02-06| Sent | 9 |
| 2 | 2019-02-12| NotSent | 7 |
| 3 | 2019-02-13| Bounces | 22 |
| 4 | 2019-02-14| Bounces | 22 |
| 5 | 2019-03-06| Sent | 6 |
| 6 | 2019-03-07| NotSent | 5 |
| 7 | 2019-03-12| Bounces | 7 |
| 8 | 2019-03-13| Sent | 22 |
| 9 | 2019-04-05| Sent | 2 |
沿着這條線的東西,所以我可以連續幾天將兩個計數與“已發送”的 EventType 連接起來,並顯示其他 EventType 而不連接它們,例如 Bounces 和 NotSent。
我寫了一個查詢,合並表中所有連續的 2 天。
它提供了您想要的完全相同的 output。
我認為您的意思是第 5 行中的“2019-03-06”,所以我在我的虛擬數據部分中修復了它。
WITH
data AS (
SELECT CAST('2019-02-06' as date) as EventDate, 4 as count union all
SELECT CAST('2019-02-07' as date) as EventDate, 5 as count union all
SELECT CAST('2019-02-12' as date) as EventDate, 7 as count union all
SELECT CAST('2019-02-13' as date) as EventDate, 22 as count union all
SELECT CAST('2019-03-06' as date) as EventDate, 2 as count
),
data_with_steps AS (
SELECT *,
IF(DATE_DIFF(EventDate, LAG(EventDate) OVER (ORDER BY EventDate), day) > 2, 1, 0) as new_step
FROM data
),
data_grouped AS (
SELECT *,
SUM(new_step) OVER (ORDER BY EventDate) as step_group
FROM data_with_steps
)
SELECT MIN(EventDate) as EventDate, sum(count) as count
FROM data_grouped
GROUP BY step_group
那么它是怎樣工作的?
首先,我計算與前一天的日期差。 如果超過 2 天,我將值設置為 1,否則為新列new_step
設置為 0。
然后,我計算new_step
列的累積和並將其命名為 step_group。
前兩步的output為:
在最后一步,我按 step_group 對表進行分組,並獲得最小日期作為事件日期,並對計數求和以獲得組計數。
編輯:要添加其他事件而不分組,我添加了一個新版本。 我認為最直觀和最簡單的方法是使用Union All
來解決這個問題。 因此,您可以使用該更新后的查詢來包含其他事件而無需分組。
WITH
data AS (
SELECT CAST('2019-02-06' as date) as EventDate, 'Sent' as EventType, 4 as count union all
SELECT CAST('2019-02-07' as date) as EventDate, 'Sent' as EventType, 5 as count union all
SELECT CAST('2019-02-12' as date) as EventDate, 'Sent' as EventType, 7 as count union all
SELECT CAST('2019-02-13' as date) as EventDate, 'Sent' as EventType, 22 as count union all
SELECT CAST('2019-03-06' as date) as EventDate, 'Sent' as EventType, 2 as count union all
SELECT CAST('2019-02-12' as date) as EventDate, 'NotSent' as EventType, 7 as count union all
SELECT CAST('2019-03-07' as date) as EventDate, 'NotSent' as EventType, 5 as count union all
SELECT CAST('2019-02-13' as date) as EventDate, 'Bounces' as EventType, 22 as count union all
SELECT CAST('2019-02-14' as date) as EventDate, 'Bounces' as EventType, 22 as count union all
SELECT CAST('2019-03-12' as date) as EventDate, 'Bounces' as EventType, 7 as count
),
data_with_steps AS (
SELECT *,
IF(DATE_DIFF(EventDate, LAG(EventDate) OVER (ORDER BY EventDate), day) > 2, 1, 0) as new_step
FROM data
WHERE EventType = 'Sent'
),
data_grouped AS (
SELECT *,
SUM(new_step) OVER (ORDER BY EventDate) as step_group
FROM data_with_steps
)
SELECT EventType, MIN(EventDate) as EventDate, sum(count) as count
FROM data_grouped
GROUP BY EventType, step_group
UNION ALL
SELECT EventType, EventDate, count
FROM data
WHERE EventType != 'Sent'
這是一個差距和孤島問題。 最簡單的方法是使用row_number()
和減法來識別“島嶼”。 然后聚合:
select min(row), eventType, min(eventDate), sum(count)
from (select t.*,
row_number() over (partition by eventType order by eventDate) as seqnum
from t
) t
group by eventType, dateadd(eventDate, interval -seqnum day)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.