簡體   English   中英

BigQuery SQL - 如果兩列連續出現,則連接它們

[英]BigQuery SQL - Concatenate two columns if they are on consecutive days

我正在尋找一種方法來調整在 BigQuery 中運行的 sql 查詢,以返回連續兩天甚至三天發生的已發送事件類型的單個計數總數。

SELECT date(EventDate) as EventDate, EventType, count(*) as count FROM `Database.Table`
    where date(EventDate) > DATE_SUB (CURRENT_DATE, INTERVAL 100 DAY)
    Group by 1,2 
    ORDER by 1,2

來自上述查詢的響應:

| Row    | EventDate | EventType | count |
| ------ | --------- |-----------|-------|
| 1      | 2019-02-06|  Sent     |    4  |
| 2      | 2019-02-07|  Sent     |    5  |
| 3      | 2019-02-12|  NotSent  |    7  |
| 4      | 2019-02-13|  Bounces  |    22 |
| 5      | 2019-02-14|  Bounces  |    22 |
| 6      | 2019-03-06|  Sent     |    2  |
| 7      | 2019-03-07|  Sent     |    4  |
| 8      | 2019-03-07|  NotSent  |    5  |
| 9      | 2019-03-12|  Bounces  |    7  |
| 10     | 2019-03-13|  Sent     |    22 |
| 11     | 2019-04-05|  Sent     |    2  |

我想得到的回應:

| Row    | EventDate | EventType | count |
| ------ | --------- |-----------|-------|
| 1      | 2019-02-06|  Sent     |    9  |
| 2      | 2019-02-12|  NotSent  |    7  |
| 3      | 2019-02-13|  Bounces  |    22 |
| 4      | 2019-02-14|  Bounces  |    22 |
| 5      | 2019-03-06|  Sent     |    6  |
| 6      | 2019-03-07|  NotSent  |    5  |
| 7      | 2019-03-12|  Bounces  |    7  |
| 8      | 2019-03-13|  Sent     |    22 |
| 9      | 2019-04-05|  Sent     |    2  |

沿着這條線的東西,所以我可以連續幾天將兩個計數與“已發送”的 EventType 連接起來,並顯示其他 EventType 而不連接它們,例如 Bounces 和 NotSent。

我寫了一個查詢,合並表中所有連續的 2 天。
它提供了您想要的完全相同的 output。

我認為您的意思是第 5 行中的“2019-03-06”,所以我在我的虛擬數據部分中修復了它。

WITH
data AS (
  SELECT CAST('2019-02-06' as date) as EventDate, 4 as count union all
  SELECT CAST('2019-02-07' as date) as EventDate, 5 as count union all
  SELECT CAST('2019-02-12' as date) as EventDate, 7 as count union all
  SELECT CAST('2019-02-13' as date) as EventDate, 22 as count union all
  SELECT CAST('2019-03-06' as date) as EventDate, 2 as count
),
data_with_steps AS (
  SELECT *, 
    IF(DATE_DIFF(EventDate, LAG(EventDate) OVER (ORDER BY EventDate), day) > 2, 1, 0) as new_step
  FROM data
),
data_grouped AS (
  SELECT *, 
    SUM(new_step) OVER (ORDER BY EventDate) as step_group
  FROM data_with_steps
)
SELECT MIN(EventDate) as EventDate, sum(count) as count
FROM data_grouped
GROUP BY step_group

那么它是怎樣工作的?
首先,我計算與前一天的日期差。 如果超過 2 天,我將值設置為 1,否則為新列new_step設置為 0。
然后,我計算new_step列的累積和並將其命名為 step_group。
前兩步的output為:
在此處輸入圖像描述

在最后一步,我按 step_group 對表進行分組,並獲得最小日期作為事件日期,並對計數求和以獲得組計數。
在此處輸入圖像描述

編輯:要添加其他事件而不分組,我添加了一個新版本。 我認為最直觀和最簡單的方法是使用Union All來解決這個問題。 因此,您可以使用該更新后的查詢來包含其他事件而無需分組。

WITH
data AS (
  SELECT CAST('2019-02-06' as date) as EventDate, 'Sent' as EventType, 4 as count union all
  SELECT CAST('2019-02-07' as date) as EventDate, 'Sent' as EventType, 5 as count union all
  SELECT CAST('2019-02-12' as date) as EventDate, 'Sent' as EventType, 7 as count union all
  SELECT CAST('2019-02-13' as date) as EventDate, 'Sent' as EventType, 22 as count union all
  SELECT CAST('2019-03-06' as date) as EventDate, 'Sent' as EventType, 2 as count union all
  SELECT CAST('2019-02-12' as date) as EventDate, 'NotSent' as EventType, 7 as count union all
  SELECT CAST('2019-03-07' as date) as EventDate, 'NotSent' as EventType, 5 as count union all
  SELECT CAST('2019-02-13' as date) as EventDate, 'Bounces' as EventType, 22 as count union all
  SELECT CAST('2019-02-14' as date) as EventDate, 'Bounces' as EventType, 22 as count union all
  SELECT CAST('2019-03-12' as date) as EventDate, 'Bounces' as EventType, 7 as count
),
data_with_steps AS (
  SELECT *, 
    IF(DATE_DIFF(EventDate, LAG(EventDate) OVER (ORDER BY EventDate), day) > 2, 1, 0) as new_step
  FROM data
  WHERE EventType = 'Sent'
),
data_grouped AS (
  SELECT *, 
    SUM(new_step) OVER (ORDER BY EventDate) as step_group
  FROM data_with_steps
)
SELECT EventType, MIN(EventDate) as EventDate, sum(count) as count
FROM data_grouped
GROUP BY EventType, step_group

UNION ALL

SELECT EventType, EventDate, count
FROM data
WHERE EventType != 'Sent'

這是一個差距和孤島問題。 最簡單的方法是使用row_number()和減法來識別“島嶼”。 然后聚合:

select min(row), eventType, min(eventDate), sum(count)
from (select t.*,
             row_number() over (partition by eventType order by eventDate) as seqnum
      from t
     ) t
group by eventType, dateadd(eventDate, interval -seqnum day)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM