繁体   English   中英

BigQuery SQL - 如果两列连续出现,则连接它们

[英]BigQuery SQL - Concatenate two columns if they are on consecutive days

我正在寻找一种方法来调整在 BigQuery 中运行的 sql 查询,以返回连续两天甚至三天发生的已发送事件类型的单个计数总数。

SELECT date(EventDate) as EventDate, EventType, count(*) as count FROM `Database.Table`
    where date(EventDate) > DATE_SUB (CURRENT_DATE, INTERVAL 100 DAY)
    Group by 1,2 
    ORDER by 1,2

来自上述查询的响应:

| Row    | EventDate | EventType | count |
| ------ | --------- |-----------|-------|
| 1      | 2019-02-06|  Sent     |    4  |
| 2      | 2019-02-07|  Sent     |    5  |
| 3      | 2019-02-12|  NotSent  |    7  |
| 4      | 2019-02-13|  Bounces  |    22 |
| 5      | 2019-02-14|  Bounces  |    22 |
| 6      | 2019-03-06|  Sent     |    2  |
| 7      | 2019-03-07|  Sent     |    4  |
| 8      | 2019-03-07|  NotSent  |    5  |
| 9      | 2019-03-12|  Bounces  |    7  |
| 10     | 2019-03-13|  Sent     |    22 |
| 11     | 2019-04-05|  Sent     |    2  |

我想得到的回应:

| Row    | EventDate | EventType | count |
| ------ | --------- |-----------|-------|
| 1      | 2019-02-06|  Sent     |    9  |
| 2      | 2019-02-12|  NotSent  |    7  |
| 3      | 2019-02-13|  Bounces  |    22 |
| 4      | 2019-02-14|  Bounces  |    22 |
| 5      | 2019-03-06|  Sent     |    6  |
| 6      | 2019-03-07|  NotSent  |    5  |
| 7      | 2019-03-12|  Bounces  |    7  |
| 8      | 2019-03-13|  Sent     |    22 |
| 9      | 2019-04-05|  Sent     |    2  |

沿着这条线的东西,所以我可以连续几天将两个计数与“已发送”的 EventType 连接起来,并显示其他 EventType 而不连接它们,例如 Bounces 和 NotSent。

我写了一个查询,合并表中所有连续的 2 天。
它提供了您想要的完全相同的 output。

我认为您的意思是第 5 行中的“2019-03-06”,所以我在我的虚拟数据部分中修复了它。

WITH
data AS (
  SELECT CAST('2019-02-06' as date) as EventDate, 4 as count union all
  SELECT CAST('2019-02-07' as date) as EventDate, 5 as count union all
  SELECT CAST('2019-02-12' as date) as EventDate, 7 as count union all
  SELECT CAST('2019-02-13' as date) as EventDate, 22 as count union all
  SELECT CAST('2019-03-06' as date) as EventDate, 2 as count
),
data_with_steps AS (
  SELECT *, 
    IF(DATE_DIFF(EventDate, LAG(EventDate) OVER (ORDER BY EventDate), day) > 2, 1, 0) as new_step
  FROM data
),
data_grouped AS (
  SELECT *, 
    SUM(new_step) OVER (ORDER BY EventDate) as step_group
  FROM data_with_steps
)
SELECT MIN(EventDate) as EventDate, sum(count) as count
FROM data_grouped
GROUP BY step_group

那么它是怎样工作的?
首先,我计算与前一天的日期差。 如果超过 2 天,我将值设置为 1,否则为新列new_step设置为 0。
然后,我计算new_step列的累积和并将其命名为 step_group。
前两步的output为:
在此处输入图像描述

在最后一步,我按 step_group 对表进行分组,并获得最小日期作为事件日期,并对计数求和以获得组计数。
在此处输入图像描述

编辑:要添加其他事件而不分组,我添加了一个新版本。 我认为最直观和最简单的方法是使用Union All来解决这个问题。 因此,您可以使用该更新后的查询来包含其他事件而无需分组。

WITH
data AS (
  SELECT CAST('2019-02-06' as date) as EventDate, 'Sent' as EventType, 4 as count union all
  SELECT CAST('2019-02-07' as date) as EventDate, 'Sent' as EventType, 5 as count union all
  SELECT CAST('2019-02-12' as date) as EventDate, 'Sent' as EventType, 7 as count union all
  SELECT CAST('2019-02-13' as date) as EventDate, 'Sent' as EventType, 22 as count union all
  SELECT CAST('2019-03-06' as date) as EventDate, 'Sent' as EventType, 2 as count union all
  SELECT CAST('2019-02-12' as date) as EventDate, 'NotSent' as EventType, 7 as count union all
  SELECT CAST('2019-03-07' as date) as EventDate, 'NotSent' as EventType, 5 as count union all
  SELECT CAST('2019-02-13' as date) as EventDate, 'Bounces' as EventType, 22 as count union all
  SELECT CAST('2019-02-14' as date) as EventDate, 'Bounces' as EventType, 22 as count union all
  SELECT CAST('2019-03-12' as date) as EventDate, 'Bounces' as EventType, 7 as count
),
data_with_steps AS (
  SELECT *, 
    IF(DATE_DIFF(EventDate, LAG(EventDate) OVER (ORDER BY EventDate), day) > 2, 1, 0) as new_step
  FROM data
  WHERE EventType = 'Sent'
),
data_grouped AS (
  SELECT *, 
    SUM(new_step) OVER (ORDER BY EventDate) as step_group
  FROM data_with_steps
)
SELECT EventType, MIN(EventDate) as EventDate, sum(count) as count
FROM data_grouped
GROUP BY EventType, step_group

UNION ALL

SELECT EventType, EventDate, count
FROM data
WHERE EventType != 'Sent'

这是一个差距和孤岛问题。 最简单的方法是使用row_number()和减法来识别“岛屿”。 然后聚合:

select min(row), eventType, min(eventDate), sum(count)
from (select t.*,
             row_number() over (partition by eventType order by eventDate) as seqnum
      from t
     ) t
group by eventType, dateadd(eventDate, interval -seqnum day)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM