[英]Find duplicates comparing 2 fields in PostgreSQL
I've a table with the following data 我有一张包含以下数据的表格
id parent_id ascii_name lang
1 123 Foo en
2 123 Foo fi
3 456 Bar it
4 345 Foo fr
I want to select all the records that have the same parent_id
and ascii_name
, basically I want this: 我想选择具有相同
parent_id
和ascii_name
所有记录,基本上我想要这个:
id parent_id ascii_name lang
1 123 Foo en
2 123 Foo fi
Right now I was able to select the records having only the same ascii_name
: 现在我能够选择只有相同
ascii_name
的记录:
id parent_id ascii_name lang
1 123 Foo en
2 123 Foo fi
4 345 Foo fr
using the query: 使用查询:
SELECT * FROM table WHERE ascii_name in
(SELECT ascii_name FROM table GROUP By ascii_name
HAVING "count"(ascii_name) > 1)
I don't know how to put the parent_id
into the equation. 我不知道如何将
parent_id
放入等式中。
I found the right query using both @jakub and @mucio answers: 我使用@jakub和@mucio答案找到了正确的查询:
SELECT * FROM geo_nodes_copy WHERE (parent_id,ascii_name) in
(SELECT parent_id, ascii_name
FROM geo_nodes_copy
GROUP By parent_id, ascii_name
HAVING count (1) > 1)
Now, the only problem is, maybe, the query speed. 现在,唯一的问题可能是查询速度。
Use the following query as subquery 使用以下查询作为子查询
SELECT parent_id,
ascii_name
FROM table
GROUP By parent_id,
ascii_name
HAVING count (1) > 1
This will return you all the couple parent_id
/ ascii_name
with multiple rows. 这将返回所有具有多行的
parent_id
/ ascii_name
。
Well, since it's pg you can use a row construct: 好吧,因为它是pg你可以使用行结构:
SELECT * FROM table WHERE (ascii_name,parent_id) in
(SELECT ascii_name, parent_id FROM table GROUP By ascii_name, parent_id HAVING Count(ascii_name) > 1)
Use window functions: 使用窗口功能:
select t.*
from (select t.*, count(*) over (partition by ascii_name, parent_id) as cnt
from table t
) t
where cnt >= 2;
Under some circumstances, it might be a bit faster to use exists
: 在某些情况下,使用
exists
可能会更快一些:
select t.*
from table t
where exists (select 1
from table t2
where t2.ascii_name = t.ascii_name and
t2.parent_id = t.parent_id and
t2.id <> t.id
);
For performance, include an index on table(ascii_name, parent_id, id)
. 为了提高性能,请在
table(ascii_name, parent_id, id)
上包含索引table(ascii_name, parent_id, id)
。
Assuming that a parentid will always share the same asciiname 假设一个parentid将始终共享相同的asciiname
SELECT a.*
FROM table a
WHERE a.ascii_name =
(SELECT b.ascii_name
FROM table b
WHERE a.parent_id = b.parent_id)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.