簡體   English   中英

僅在 window 分區中查找具有非重復值的行

[英]Finding only rows with non-duplicated values within a window partition

我想看看為什么相同permit ID 的某些descriptions會有所不同。 這是表格(我使用的是雪花):

create or replace table permits (permit varchar(255), description varchar(255));

// dupe permits, dupe descriptions, throw out
INSERT INTO permits VALUES ('1', 'abc'); 
INSERT INTO permits VALUES ('1', 'abc');

// dupe permits, unique descriptions, keep
INSERT INTO permits VALUES ('2', 'def1'); 
INSERT INTO permits VALUES ('2', 'def2');
INSERT INTO permits VALUES ('2', 'def3');

// dupe permits, unique descriptions, keep
INSERT INTO permits VALUES ('3', NULL);   
INSERT INTO permits VALUES ('3', 'ghi1');

// unique permit, throw out
INSERT INTO permits VALUES ('5', 'xyz'); 

我想要的是查詢該表並僅取出具有重復許可 id 但描述不同的行集。

我想要的 output 是這樣的:

+---------+-------------+
| PERMIT  | DESCRIPTION |
+---------+-------------+
|       2 | def1        |
|       2 | def2        |
|       2 | def3        |
|       3 |             |
|       3 | ghi1        |
+---------+-------------+

我試過這個:

with with_dupe_counts as (
    select
        count(permit) over (partition by permit order by permit) as permit_dupecount,
        count(description) over (partition by permit order by permit) as description_dupecount,
        permit,
        description
    from permits
)
select *
from with_dupe_counts
where permit_dupecount > 1 
and description_dupecount > 1

這給了我許可 1 和 2 並計算描述是否唯一:

+------------------+-----------------------+--------+-------------+
| PERMIT_DUPECOUNT | DESCRIPTION_DUPECOUNT | PERMIT | DESCRIPTION |
+------------------+-----------------------+--------+-------------+
|                2 |                     2 |      1 | abc         |
|                2 |                     2 |      1 | abc         |
|                3 |                     3 |      2 | def1        |
|                3 |                     3 |      2 | def2        |
|                3 |                     3 |      2 | def3        |
+------------------+-----------------------+--------+-------------+

我認為可行的是

count(unique description) over (partition by permit order by permit) as description_dupecount

但正如我意識到的那樣,在 window 函數中有很多東西不起作用。 這個問題不一定是“我如何讓 count(unique x) 在 window 函數中工作”,因為我不知道這是否是解決這個問題的最佳方法。

我認為一個簡單group by不會起作用,因為我想取回原始行。

一種方法使用min()max()count()

select *
from (select p.*,
             min(description) over (partition by permit) as min_d,
             max(description) over (partition by permit) as max_d,
             count(description) over (partition by permit) as cnt_d,
             count(*) over (partition by permit) as cnt,
            count(permit) over (partition by permit order by permit) as permit_dupecount
      from permits
     )
where min_d <> max_d or cnt_d <> cnt;

我只會使用exists

select p.*
from permits p
where exists (
    select 1 
    from permits p1 
    where p1.permit = p.permit and p1.description <> p.description
)

要處理null值,我們可以使用 Snowlake 支持的標准空安全相等運算符IS DISTINCT FROM

select p.*
from permits p
where exists (
    select 1 
    from permits p1 
    where 
        p1.permit = p.permit 
        and p1.description is distinct from p.description
)

應該管用

SELECT DISTINCT p1.permit, p1.description
FROM permits p1
JOIN permits p2 ON p1.permit = p2.permit
WHERE p1.description != p2.description OR p1.description IS NULL AND p2.description IS NOT NULL

這是我的 go 到:

with x as (
    select permit, count(distinct description) cnt
    from permits p1 
    group by permit
    having cnt > 1
    )
select p.*
from x
join permits p
  on x.permit = p.permit;

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM