[英]BigQuery SQL - Concatenate two columns if they are on consecutive days
我正在寻找一种方法来调整在 BigQuery 中运行的 sql 查询,以返回连续两天甚至三天发生的已发送事件类型的单个计数总数。
SELECT date(EventDate) as EventDate, EventType, count(*) as count FROM `Database.Table`
where date(EventDate) > DATE_SUB (CURRENT_DATE, INTERVAL 100 DAY)
Group by 1,2
ORDER by 1,2
来自上述查询的响应:
| Row | EventDate | EventType | count |
| ------ | --------- |-----------|-------|
| 1 | 2019-02-06| Sent | 4 |
| 2 | 2019-02-07| Sent | 5 |
| 3 | 2019-02-12| NotSent | 7 |
| 4 | 2019-02-13| Bounces | 22 |
| 5 | 2019-02-14| Bounces | 22 |
| 6 | 2019-03-06| Sent | 2 |
| 7 | 2019-03-07| Sent | 4 |
| 8 | 2019-03-07| NotSent | 5 |
| 9 | 2019-03-12| Bounces | 7 |
| 10 | 2019-03-13| Sent | 22 |
| 11 | 2019-04-05| Sent | 2 |
我想得到的回应:
| Row | EventDate | EventType | count |
| ------ | --------- |-----------|-------|
| 1 | 2019-02-06| Sent | 9 |
| 2 | 2019-02-12| NotSent | 7 |
| 3 | 2019-02-13| Bounces | 22 |
| 4 | 2019-02-14| Bounces | 22 |
| 5 | 2019-03-06| Sent | 6 |
| 6 | 2019-03-07| NotSent | 5 |
| 7 | 2019-03-12| Bounces | 7 |
| 8 | 2019-03-13| Sent | 22 |
| 9 | 2019-04-05| Sent | 2 |
沿着这条线的东西,所以我可以连续几天将两个计数与“已发送”的 EventType 连接起来,并显示其他 EventType 而不连接它们,例如 Bounces 和 NotSent。
我写了一个查询,合并表中所有连续的 2 天。
它提供了您想要的完全相同的 output。
我认为您的意思是第 5 行中的“2019-03-06”,所以我在我的虚拟数据部分中修复了它。
WITH
data AS (
SELECT CAST('2019-02-06' as date) as EventDate, 4 as count union all
SELECT CAST('2019-02-07' as date) as EventDate, 5 as count union all
SELECT CAST('2019-02-12' as date) as EventDate, 7 as count union all
SELECT CAST('2019-02-13' as date) as EventDate, 22 as count union all
SELECT CAST('2019-03-06' as date) as EventDate, 2 as count
),
data_with_steps AS (
SELECT *,
IF(DATE_DIFF(EventDate, LAG(EventDate) OVER (ORDER BY EventDate), day) > 2, 1, 0) as new_step
FROM data
),
data_grouped AS (
SELECT *,
SUM(new_step) OVER (ORDER BY EventDate) as step_group
FROM data_with_steps
)
SELECT MIN(EventDate) as EventDate, sum(count) as count
FROM data_grouped
GROUP BY step_group
那么它是怎样工作的?
首先,我计算与前一天的日期差。 如果超过 2 天,我将值设置为 1,否则为新列new_step
设置为 0。
然后,我计算new_step
列的累积和并将其命名为 step_group。
前两步的output为:
在最后一步,我按 step_group 对表进行分组,并获得最小日期作为事件日期,并对计数求和以获得组计数。
编辑:要添加其他事件而不分组,我添加了一个新版本。 我认为最直观和最简单的方法是使用Union All
来解决这个问题。 因此,您可以使用该更新后的查询来包含其他事件而无需分组。
WITH
data AS (
SELECT CAST('2019-02-06' as date) as EventDate, 'Sent' as EventType, 4 as count union all
SELECT CAST('2019-02-07' as date) as EventDate, 'Sent' as EventType, 5 as count union all
SELECT CAST('2019-02-12' as date) as EventDate, 'Sent' as EventType, 7 as count union all
SELECT CAST('2019-02-13' as date) as EventDate, 'Sent' as EventType, 22 as count union all
SELECT CAST('2019-03-06' as date) as EventDate, 'Sent' as EventType, 2 as count union all
SELECT CAST('2019-02-12' as date) as EventDate, 'NotSent' as EventType, 7 as count union all
SELECT CAST('2019-03-07' as date) as EventDate, 'NotSent' as EventType, 5 as count union all
SELECT CAST('2019-02-13' as date) as EventDate, 'Bounces' as EventType, 22 as count union all
SELECT CAST('2019-02-14' as date) as EventDate, 'Bounces' as EventType, 22 as count union all
SELECT CAST('2019-03-12' as date) as EventDate, 'Bounces' as EventType, 7 as count
),
data_with_steps AS (
SELECT *,
IF(DATE_DIFF(EventDate, LAG(EventDate) OVER (ORDER BY EventDate), day) > 2, 1, 0) as new_step
FROM data
WHERE EventType = 'Sent'
),
data_grouped AS (
SELECT *,
SUM(new_step) OVER (ORDER BY EventDate) as step_group
FROM data_with_steps
)
SELECT EventType, MIN(EventDate) as EventDate, sum(count) as count
FROM data_grouped
GROUP BY EventType, step_group
UNION ALL
SELECT EventType, EventDate, count
FROM data
WHERE EventType != 'Sent'
这是一个差距和孤岛问题。 最简单的方法是使用row_number()
和减法来识别“岛屿”。 然后聚合:
select min(row), eventType, min(eventDate), sum(count)
from (select t.*,
row_number() over (partition by eventType order by eventDate) as seqnum
from t
) t
group by eventType, dateadd(eventDate, interval -seqnum day)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.