简体   繁体   English

查找比较PostgreSQL中2个字段的重复项

[英]Find duplicates comparing 2 fields in PostgreSQL

I've a table with the following data 我有一张包含以下数据的表格

id  parent_id   ascii_name  lang
1   123         Foo         en
2   123         Foo         fi
3   456         Bar         it
4   345         Foo         fr

I want to select all the records that have the same parent_id and ascii_name , basically I want this: 我想选择具有相同parent_idascii_name所有记录,基本上我想要这个:

id  parent_id   ascii_name  lang
1   123         Foo         en
2   123         Foo         fi

Right now I was able to select the records having only the same ascii_name : 现在我能够选择只有相同ascii_name的记录:

id  parent_id   ascii_name  lang
1   123         Foo         en
2   123         Foo         fi
4   345         Foo         fr

using the query: 使用查询:

SELECT * FROM table WHERE ascii_name in 
(SELECT ascii_name FROM table GROUP By ascii_name
 HAVING "count"(ascii_name) > 1)

I don't know how to put the parent_id into the equation. 我不知道如何将parent_id放入等式中。

Update 更新

I found the right query using both @jakub and @mucio answers: 我使用@jakub和@mucio答案找到了正确的查询:

SELECT * FROM geo_nodes_copy WHERE (parent_id,ascii_name) in 
(SELECT parent_id, ascii_name 
 FROM geo_nodes_copy 
 GROUP By parent_id, ascii_name 
 HAVING count (1) > 1)

Now, the only problem is, maybe, the query speed. 现在,唯一的问题可能是查询速度。

Use the following query as subquery 使用以下查询作为子查询

   SELECT parent_id, 
          ascii_name 
     FROM table 
 GROUP By parent_id, 
          ascii_name 
   HAVING count (1) > 1

This will return you all the couple parent_id / ascii_name with multiple rows. 这将返回所有具有多行的parent_id / ascii_name

Well, since it's pg you can use a row construct: 好吧,因为它是pg你可以使用行结构:

SELECT * FROM table WHERE (ascii_name,parent_id) in 
(SELECT ascii_name, parent_id FROM table GROUP By ascii_name, parent_id HAVING Count(ascii_name) > 1)

Use window functions: 使用窗口功能:

select t.*
from (select t.*, count(*) over (partition by ascii_name, parent_id) as cnt
      from table t
     ) t
where cnt >= 2;

Under some circumstances, it might be a bit faster to use exists : 在某些情况下,使用exists可能会更快一些:

select t.*
from table t
where exists (select 1
              from table t2
              where t2.ascii_name = t.ascii_name and
                    t2.parent_id = t.parent_id and
                    t2.id <> t.id
             );

For performance, include an index on table(ascii_name, parent_id, id) . 为了提高性能,请在table(ascii_name, parent_id, id)上包含索引table(ascii_name, parent_id, id)

Assuming that a parentid will always share the same asciiname 假设一个parentid将始终共享相同的asciiname

SELECT a.* 
FROM table a
WHERE a.ascii_name =
(SELECT b.ascii_name 
 FROM table b
 WHERE a.parent_id = b.parent_id)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM