![](/img/trans.png)
[英]Check if value exists in group and assign it to all rows of group in BigQuery
[英]SQL (BigQuery) - get rows with a specific value and add count when a sequence of that value is broken with another value by group
假设我有以下数据集
'2022-01-01' - A - a
'2022-01-02' - A - a
'2022-01-03' - A - b
'2022-01-03' - A - a
'2022-01-01' - B - a
'2022-01-01' - B - b
'2022-01-02' - B - c
'2022-01-02' - B - a
'2022-01-03' - B - b
'2022-01-01' - C - a
'2022-01-02' - C - a
'2022-01-03' - C - a
'2022-01-01' - C - c
'2022-01-02' - C - a
'2022-01-03' - C - a
'2022-01-04' - C - b
'2022-01-05' - C - a
我目前正在寻找的查询将到达此数据集:
A - a - 1
A - a - 1
A - a - 2
B - a - 1
B - a - 2
C - a - 1
C - a - 1
C - a - 1
C - a - 2
C - a - 2
C - a - 3
基本上消除了第三列中没有值“a”的所有行,并在第 2 列的分组中破坏并恢复 a 的序列时添加一个计数器。如何实现?
使用 window 函数的方法:
with sample as (
select '2022-01-01' as date, 'A' as l1, 'a' as l2 UNION ALL
select '2022-01-02' as date, 'A' as l1, 'a' as l2 UNION ALL
select '2022-01-03' as date, 'A' as l1, 'b' as l2 UNION ALL
select '2022-01-03' as date, 'A' as l1, 'a' as l2 UNION ALL
select '2022-01-01' as date, 'B' as l1, 'a' as l2 UNION ALL
select '2022-01-01' as date, 'B' as l1, 'b' as l2 UNION ALL
select '2022-01-02' as date, 'B' as l1, 'c' as l2 UNION ALL
select '2022-01-02' as date, 'B' as l1, 'a' as l2 UNION ALL
select '2022-01-03' as date, 'B' as l1, 'b' as l2 UNION ALL
select '2022-01-01' as date, 'C' as l1, 'a' as l2 UNION ALL
select '2022-01-02' as date, 'C' as l1, 'a' as l2 UNION ALL
select '2022-01-03' as date, 'C' as l1, 'a' as l2 UNION ALL
select '2022-01-01' as date, 'C' as l1, 'c' as l2 UNION ALL
select '2022-01-02' as date, 'C' as l1, 'a' as l2 UNION ALL
select '2022-01-03' as date, 'C' as l1, 'a' as l2 UNION ALL
select '2022-01-04' as date, 'C' as l1, 'b' as l2 UNION ALL
select '2022-01-05' as date, 'C' as l1, 'a' as l2
),
number_rows AS (
SELECT
*,
ROW_NUMBER() OVER () as row_number -- ideally you would have a column to order the rows
FROM sample
),
flag_changes AS (
SELECT
*,
IF((row_number - LAG(row_number) OVER (PARTITION BY l1 ORDER BY row_number ASC)) != 1, 1, 0) as changed
FROM number_rows
WHERE l2 = 'a'
)
SELECT
* EXCEPT (row_number, changed),
1 + SUM(changed) OVER (PARTITION BY l1 ORDER BY row_number ASC)
FROM flag_changes
Output:
l1 l2 counter
A a 1
A a 1
A a 2
B a 1
B a 2
C a 1
C a 1
C a 1
C a 2
C a 2
C a 3
注意:此解决方案取决于以与您放入示例数据相同的方式对行进行排序和编号的方式。 在示例中,我使用了ROW_NUMBER() OVER ()
,但如果在大表中使用它,每次运行可能会得到不同的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.