I want to look at why some descriptions
are different for the same permit
id. Here's the table (I'm using Snowflake):
create or replace table permits (permit varchar(255), description varchar(255));
// dupe permits, dupe descriptions, throw out
INSERT INTO permits VALUES ('1', 'abc');
INSERT INTO permits VALUES ('1', 'abc');
// dupe permits, unique descriptions, keep
INSERT INTO permits VALUES ('2', 'def1');
INSERT INTO permits VALUES ('2', 'def2');
INSERT INTO permits VALUES ('2', 'def3');
// dupe permits, unique descriptions, keep
INSERT INTO permits VALUES ('3', NULL);
INSERT INTO permits VALUES ('3', 'ghi1');
// unique permit, throw out
INSERT INTO permits VALUES ('5', 'xyz');
What I want is to query this table and get out only the sets of rows that have duplicate permit ids but different descriptions.
The output I want is this:
+---------+-------------+
| PERMIT | DESCRIPTION |
+---------+-------------+
| 2 | def1 |
| 2 | def2 |
| 2 | def3 |
| 3 | |
| 3 | ghi1 |
+---------+-------------+
I've tried this:
with with_dupe_counts as (
select
count(permit) over (partition by permit order by permit) as permit_dupecount,
count(description) over (partition by permit order by permit) as description_dupecount,
permit,
description
from permits
)
select *
from with_dupe_counts
where permit_dupecount > 1
and description_dupecount > 1
Which gives me permits 1 and 2 and counts descriptions whether they are unique or not:
+------------------+-----------------------+--------+-------------+
| PERMIT_DUPECOUNT | DESCRIPTION_DUPECOUNT | PERMIT | DESCRIPTION |
+------------------+-----------------------+--------+-------------+
| 2 | 2 | 1 | abc |
| 2 | 2 | 1 | abc |
| 3 | 3 | 2 | def1 |
| 3 | 3 | 2 | def2 |
| 3 | 3 | 2 | def3 |
+------------------+-----------------------+--------+-------------+
What I think would work would be
count(unique description) over (partition by permit order by permit) as description_dupecount
But as I'm realizing there are lots of things that don't work in window functions. This question isn't necessarily "how do I get count(unique x) to work in a window function" because I don't know if that is the best way to solve this.
A simple group by
I don't think will work because I want to get the original rows back.
One method uses min()
and max()
and count()
:
select *
from (select p.*,
min(description) over (partition by permit) as min_d,
max(description) over (partition by permit) as max_d,
count(description) over (partition by permit) as cnt_d,
count(*) over (partition by permit) as cnt,
count(permit) over (partition by permit order by permit) as permit_dupecount
from permits
)
where min_d <> max_d or cnt_d <> cnt;
I would just use exists
:
select p.*
from permits p
where exists (
select 1
from permits p1
where p1.permit = p.permit and p1.description <> p.description
)
To handle the null
values, we can use standard null-safe equality operator IS DISTINCT FROM
, which Snowlake supports:
select p.*
from permits p
where exists (
select 1
from permits p1
where
p1.permit = p.permit
and p1.description is distinct from p.description
)
Should work
SELECT DISTINCT p1.permit, p1.description
FROM permits p1
JOIN permits p2 ON p1.permit = p2.permit
WHERE p1.description != p2.description OR p1.description IS NULL AND p2.description IS NOT NULL
This is my go to:
with x as (
select permit, count(distinct description) cnt
from permits p1
group by permit
having cnt > 1
)
select p.*
from x
join permits p
on x.permit = p.permit;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.