[英]SQL select entire record found in group
我利用PostgreSQL中的metaphone函数来查找可能被拼写错误的重复记录。
SELECT metaphone(first_name, 4), metaphone(last_name, 4)
FROM people GROUP BY metaphone(last_name, 4),
metaphone(first_name, 4) HAVING COUNT(*) > 1;
这非常适合向我显示数据库中至少有100个潜在重复项,但是我无法做到这一点,因为我无法从查询结果中获得任何唯一标识信息。 我已经试过了:
SELECT person_id, first_name, last_name
FROM people
WHERE metaphone(first_name, 16) IN (
SELECT metaphone(first_name, 16)
FROM people GROUP BY metaphone(last_name, 16),
metaphone(first_name, 16) HAVING COUNT(*) > 1
)
AND metaphone(last_name, 16) IN (
SELECT metaphone(last_name, 16)
FROM people GROUP BY metaphone(last_name, 16),
metaphone(first_name, 16) HAVING COUNT(*) > 1
)
ORDER BY last_name, first_name;
哪种作品,但仍然包含一些实际上两个字段都不匹配的记录。 例如,我可以有2个“ John Smith”,2个“ Jane Smith”和2个“ John Doe”。 我可能只有一个'Jane Doe',但她会出现在第二个查询的结果中。
有什么方法可以更准确地仅获取用于编译第一个查询结果的行?
您需要一次进行两个比较:
SELECT person_id, first_name, last_name
FROM people
WHERE (metaphone(first_name, 16), metaphone(last_name, 16)
) IN (SELECT metaphone(first_name, 16), metaphone(last_name, 16)
FROM people
GROUP BY metaphone(first_name, 16), metaphone(last_name, 16),
HAVING COUNT(*) > 1
)
ORDER BY last_name, first_name;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.