Finding only rows with non-duplicated values within a window partition

Question

I want to look at why some descriptions are different for the same permit id. Here's the table (I'm using Snowflake):

create or replace table permits (permit varchar(255), description varchar(255));

// dupe permits, dupe descriptions, throw out
INSERT INTO permits VALUES ('1', 'abc'); 
INSERT INTO permits VALUES ('1', 'abc');

// dupe permits, unique descriptions, keep
INSERT INTO permits VALUES ('2', 'def1'); 
INSERT INTO permits VALUES ('2', 'def2');
INSERT INTO permits VALUES ('2', 'def3');

// dupe permits, unique descriptions, keep
INSERT INTO permits VALUES ('3', NULL);   
INSERT INTO permits VALUES ('3', 'ghi1');

// unique permit, throw out
INSERT INTO permits VALUES ('5', 'xyz');

What I want is to query this table and get out only the sets of rows that have duplicate permit ids but different descriptions.

The output I want is this:

+---------+-------------+
| PERMIT  | DESCRIPTION |
+---------+-------------+
|       2 | def1        |
|       2 | def2        |
|       2 | def3        |
|       3 |             |
|       3 | ghi1        |
+---------+-------------+

I've tried this:

with with_dupe_counts as (
    select
        count(permit) over (partition by permit order by permit) as permit_dupecount,
        count(description) over (partition by permit order by permit) as description_dupecount,
        permit,
        description
    from permits
)
select *
from with_dupe_counts
where permit_dupecount > 1 
and description_dupecount > 1

Which gives me permits 1 and 2 and counts descriptions whether they are unique or not:

+------------------+-----------------------+--------+-------------+
| PERMIT_DUPECOUNT | DESCRIPTION_DUPECOUNT | PERMIT | DESCRIPTION |
+------------------+-----------------------+--------+-------------+
|                2 |                     2 |      1 | abc         |
|                2 |                     2 |      1 | abc         |
|                3 |                     3 |      2 | def1        |
|                3 |                     3 |      2 | def2        |
|                3 |                     3 |      2 | def3        |
+------------------+-----------------------+--------+-------------+

What I think would work would be

count(unique description) over (partition by permit order by permit) as description_dupecount

But as I'm realizing there are lots of things that don't work in window functions. This question isn't necessarily "how do I get count(unique x) to work in a window function" because I don't know if that is the best way to solve this.

A simple group by I don't think will work because I want to get the original rows back.

Answer 1

One method uses min() and max() and count() :

select *
from (select p.*,
             min(description) over (partition by permit) as min_d,
             max(description) over (partition by permit) as max_d,
             count(description) over (partition by permit) as cnt_d,
             count(*) over (partition by permit) as cnt,
            count(permit) over (partition by permit order by permit) as permit_dupecount
      from permits
     )
where min_d <> max_d or cnt_d <> cnt;

Answer 2

I would just use exists :

select p.*
from permits p
where exists (
    select 1 
    from permits p1 
    where p1.permit = p.permit and p1.description <> p.description
)

To handle the null values, we can use standard null-safe equality operator IS DISTINCT FROM , which Snowlake supports:

select p.*
from permits p
where exists (
    select 1 
    from permits p1 
    where 
        p1.permit = p.permit 
        and p1.description is distinct from p.description
)

Answer 3

Should work

SELECT DISTINCT p1.permit, p1.description
FROM permits p1
JOIN permits p2 ON p1.permit = p2.permit
WHERE p1.description != p2.description OR p1.description IS NULL AND p2.description IS NOT NULL

Answer 4

This is my go to:

with x as (
    select permit, count(distinct description) cnt
    from permits p1 
    group by permit
    having cnt > 1
    )
select p.*
from x
join permits p
  on x.permit = p.permit;

Finding only rows with non-duplicated values within a window partition

Question

4 answers

solution1
1 ACCPTED 2020-07-30 20:51:14

solution2
1 2020-07-30 20:51:59

solution3
0 2020-07-30 20:51:29

solution4
0 2020-07-30 21:19:07

Finding only rows with non-duplicated values within a window partition

Question

4 answers

solution1 1 ACCPTED 2020-07-30 20:51:14

solution2 1 2020-07-30 20:51:59

solution3 0 2020-07-30 20:51:29

solution4 0 2020-07-30 21:19:07

solution1
1 ACCPTED 2020-07-30 20:51:14

solution2
1 2020-07-30 20:51:59

solution3
0 2020-07-30 20:51:29

solution4
0 2020-07-30 21:19:07