[英]Finding only rows with non-duplicated values within a window partition
我想看看為什么相同permit
ID 的某些descriptions
會有所不同。 這是表格(我使用的是雪花):
create or replace table permits (permit varchar(255), description varchar(255));
// dupe permits, dupe descriptions, throw out
INSERT INTO permits VALUES ('1', 'abc');
INSERT INTO permits VALUES ('1', 'abc');
// dupe permits, unique descriptions, keep
INSERT INTO permits VALUES ('2', 'def1');
INSERT INTO permits VALUES ('2', 'def2');
INSERT INTO permits VALUES ('2', 'def3');
// dupe permits, unique descriptions, keep
INSERT INTO permits VALUES ('3', NULL);
INSERT INTO permits VALUES ('3', 'ghi1');
// unique permit, throw out
INSERT INTO permits VALUES ('5', 'xyz');
我想要的是查詢該表並僅取出具有重復許可 id 但描述不同的行集。
我想要的 output 是這樣的:
+---------+-------------+
| PERMIT | DESCRIPTION |
+---------+-------------+
| 2 | def1 |
| 2 | def2 |
| 2 | def3 |
| 3 | |
| 3 | ghi1 |
+---------+-------------+
我試過這個:
with with_dupe_counts as (
select
count(permit) over (partition by permit order by permit) as permit_dupecount,
count(description) over (partition by permit order by permit) as description_dupecount,
permit,
description
from permits
)
select *
from with_dupe_counts
where permit_dupecount > 1
and description_dupecount > 1
這給了我許可 1 和 2 並計算描述是否唯一:
+------------------+-----------------------+--------+-------------+
| PERMIT_DUPECOUNT | DESCRIPTION_DUPECOUNT | PERMIT | DESCRIPTION |
+------------------+-----------------------+--------+-------------+
| 2 | 2 | 1 | abc |
| 2 | 2 | 1 | abc |
| 3 | 3 | 2 | def1 |
| 3 | 3 | 2 | def2 |
| 3 | 3 | 2 | def3 |
+------------------+-----------------------+--------+-------------+
我認為可行的是
count(unique description) over (partition by permit order by permit) as description_dupecount
但正如我意識到的那樣,在 window 函數中有很多東西不起作用。 這個問題不一定是“我如何讓 count(unique x) 在 window 函數中工作”,因為我不知道這是否是解決這個問題的最佳方法。
我認為一個簡單group by
不會起作用,因為我想取回原始行。
一種方法使用min()
和max()
和count()
:
select *
from (select p.*,
min(description) over (partition by permit) as min_d,
max(description) over (partition by permit) as max_d,
count(description) over (partition by permit) as cnt_d,
count(*) over (partition by permit) as cnt,
count(permit) over (partition by permit order by permit) as permit_dupecount
from permits
)
where min_d <> max_d or cnt_d <> cnt;
我只會使用exists
:
select p.*
from permits p
where exists (
select 1
from permits p1
where p1.permit = p.permit and p1.description <> p.description
)
要處理null
值,我們可以使用 Snowlake 支持的標准空安全相等運算符IS DISTINCT FROM
:
select p.*
from permits p
where exists (
select 1
from permits p1
where
p1.permit = p.permit
and p1.description is distinct from p.description
)
應該管用
SELECT DISTINCT p1.permit, p1.description
FROM permits p1
JOIN permits p2 ON p1.permit = p2.permit
WHERE p1.description != p2.description OR p1.description IS NULL AND p2.description IS NOT NULL
這是我的 go 到:
with x as (
select permit, count(distinct description) cnt
from permits p1
group by permit
having cnt > 1
)
select p.*
from x
join permits p
on x.permit = p.permit;
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.