繁体   English   中英

SQL (BigQuery) - 获取具有特定值的行,并在该值的序列被组中的另一个值破坏时添加计数

[英]SQL (BigQuery) - get rows with a specific value and add count when a sequence of that value is broken with another value by group

假设我有以下数据集

'2022-01-01' - A - a
'2022-01-02' - A - a
'2022-01-03' - A - b
'2022-01-03' - A - a
'2022-01-01' - B - a
'2022-01-01' - B - b
'2022-01-02' - B - c
'2022-01-02' - B - a
'2022-01-03' - B - b
'2022-01-01' - C - a
'2022-01-02' - C - a
'2022-01-03' - C - a
'2022-01-01' - C - c
'2022-01-02' - C - a
'2022-01-03' - C - a
'2022-01-04' - C - b
'2022-01-05' - C - a

我目前正在寻找的查询将到达此数据集:

A - a - 1
A - a - 1
A - a - 2
B - a - 1
B - a - 2
C - a - 1
C - a - 1
C - a - 1
C - a - 2
C - a - 2
C - a - 3

基本上消除了第三列中没有值“a”的所有行,并在第 2 列的分组中破坏并恢复 a 的序列时添加一个计数器。如何实现?

使用 window 函数的方法:

with sample as (
  select '2022-01-01' as date, 'A' as l1, 'a' as l2 UNION ALL
  select '2022-01-02' as date, 'A' as l1, 'a' as l2 UNION ALL
  select '2022-01-03' as date, 'A' as l1, 'b' as l2 UNION ALL
  select '2022-01-03' as date, 'A' as l1, 'a' as l2 UNION ALL
  select '2022-01-01' as date, 'B' as l1, 'a' as l2 UNION ALL
  select '2022-01-01' as date, 'B' as l1, 'b' as l2 UNION ALL
  select '2022-01-02' as date, 'B' as l1, 'c' as l2 UNION ALL
  select '2022-01-02' as date, 'B' as l1, 'a' as l2 UNION ALL
  select '2022-01-03' as date, 'B' as l1, 'b' as l2 UNION ALL
  select '2022-01-01' as date, 'C' as l1, 'a' as l2 UNION ALL
  select '2022-01-02' as date, 'C' as l1, 'a' as l2 UNION ALL
  select '2022-01-03' as date, 'C' as l1, 'a' as l2 UNION ALL
  select '2022-01-01' as date, 'C' as l1, 'c' as l2 UNION ALL
  select '2022-01-02' as date, 'C' as l1, 'a' as l2 UNION ALL
  select '2022-01-03' as date, 'C' as l1, 'a' as l2 UNION ALL
  select '2022-01-04' as date, 'C' as l1, 'b' as l2 UNION ALL
  select '2022-01-05' as date, 'C' as l1, 'a' as l2
),
number_rows AS (
  SELECT 
    *,
    ROW_NUMBER() OVER () as row_number -- ideally you would have a column to order the rows
  FROM sample
),
flag_changes AS (
  SELECT 
    *,
    IF((row_number - LAG(row_number) OVER (PARTITION BY l1 ORDER BY row_number ASC)) != 1, 1, 0) as changed
  FROM number_rows
  WHERE l2 = 'a'
)
SELECT 
  * EXCEPT (row_number, changed),
  1 + SUM(changed) OVER (PARTITION BY l1 ORDER BY row_number ASC)
FROM flag_changes

Output:

l1  l2  counter
A   a   1
A   a   1
A   a   2
B   a   1
B   a   2
C   a   1
C   a   1
C   a   1
C   a   2
C   a   2
C   a   3

注意:此解决方案取决于以与您放入示例数据相同的方式对行进行排序和编号的方式。 在示例中,我使用了ROW_NUMBER() OVER () ,但如果在大表中使用它,每次运行可能会得到不同的结果。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM