僅在 window 分區中查找具有非重復值的行

Question

我想看看為什么相同permit ID 的某些descriptions會有所不同。 這是表格（我使用的是雪花）：

create or replace table permits (permit varchar(255), description varchar(255));

// dupe permits, dupe descriptions, throw out
INSERT INTO permits VALUES ('1', 'abc'); 
INSERT INTO permits VALUES ('1', 'abc');

// dupe permits, unique descriptions, keep
INSERT INTO permits VALUES ('2', 'def1'); 
INSERT INTO permits VALUES ('2', 'def2');
INSERT INTO permits VALUES ('2', 'def3');

// dupe permits, unique descriptions, keep
INSERT INTO permits VALUES ('3', NULL);   
INSERT INTO permits VALUES ('3', 'ghi1');

// unique permit, throw out
INSERT INTO permits VALUES ('5', 'xyz');

我想要的是查詢該表並僅取出具有重復許可 id 但描述不同的行集。

我想要的 output 是這樣的：

+---------+-------------+
| PERMIT  | DESCRIPTION |
+---------+-------------+
|       2 | def1        |
|       2 | def2        |
|       2 | def3        |
|       3 |             |
|       3 | ghi1        |
+---------+-------------+

我試過這個：

with with_dupe_counts as (
    select
        count(permit) over (partition by permit order by permit) as permit_dupecount,
        count(description) over (partition by permit order by permit) as description_dupecount,
        permit,
        description
    from permits
)
select *
from with_dupe_counts
where permit_dupecount > 1 
and description_dupecount > 1

這給了我許可 1 和 2 並計算描述是否唯一：

+------------------+-----------------------+--------+-------------+
| PERMIT_DUPECOUNT | DESCRIPTION_DUPECOUNT | PERMIT | DESCRIPTION |
+------------------+-----------------------+--------+-------------+
|                2 |                     2 |      1 | abc         |
|                2 |                     2 |      1 | abc         |
|                3 |                     3 |      2 | def1        |
|                3 |                     3 |      2 | def2        |
|                3 |                     3 |      2 | def3        |
+------------------+-----------------------+--------+-------------+

我認為可行的是

count(unique description) over (partition by permit order by permit) as description_dupecount

但正如我意識到的那樣，在 window 函數中有很多東西不起作用。 這個問題不一定是“我如何讓 count(unique x) 在 window 函數中工作”，因為我不知道這是否是解決這個問題的最佳方法。

我認為一個簡單group by不會起作用，因為我想取回原始行。

Answer 1

一種方法使用min()和max()和count() ：

select *
from (select p.*,
             min(description) over (partition by permit) as min_d,
             max(description) over (partition by permit) as max_d,
             count(description) over (partition by permit) as cnt_d,
             count(*) over (partition by permit) as cnt,
            count(permit) over (partition by permit order by permit) as permit_dupecount
      from permits
     )
where min_d <> max_d or cnt_d <> cnt;

Answer 2

我只會使用exists ：

select p.*
from permits p
where exists (
    select 1 
    from permits p1 
    where p1.permit = p.permit and p1.description <> p.description
)

要處理null值，我們可以使用 Snowlake 支持的標准空安全相等運算符IS DISTINCT FROM ：

select p.*
from permits p
where exists (
    select 1 
    from permits p1 
    where 
        p1.permit = p.permit 
        and p1.description is distinct from p.description
)

Answer 3

應該管用

SELECT DISTINCT p1.permit, p1.description
FROM permits p1
JOIN permits p2 ON p1.permit = p2.permit
WHERE p1.description != p2.description OR p1.description IS NULL AND p2.description IS NOT NULL

Answer 4

這是我的 go 到：

with x as (
    select permit, count(distinct description) cnt
    from permits p1 
    group by permit
    having cnt > 1
    )
select p.*
from x
join permits p
  on x.permit = p.permit;

僅在 window 分區中查找具有非重復值的行

問題描述

4 個解決方案

解決方案1
1 已采納 2020-07-30 20:51:14

解決方案2
1 2020-07-30 20:51:59

解決方案3
0 2020-07-30 20:51:29

解決方案4
0 2020-07-30 21:19:07

僅在 window 分區中查找具有非重復值的行

問題描述

4 個解決方案

解決方案1 1 已采納 2020-07-30 20:51:14

解決方案2 1 2020-07-30 20:51:59

解決方案3 0 2020-07-30 20:51:29

解決方案4 0 2020-07-30 21:19:07

解決方案1
1 已采納 2020-07-30 20:51:14

解決方案2
1 2020-07-30 20:51:59

解決方案3
0 2020-07-30 20:51:29

解決方案4
0 2020-07-30 21:19:07