[英]I have a query that finds duplicates in my SQL database-now how do I delete said duplicates?
I have an sql query that finds and groups these duplicates using very complicated conditions: 我有一个SQL查询,它使用非常复杂的条件来查找和分组这些重复项:
SELECT right(post_url, LOCATE('-', REVERSE(post_url),LOCATE('-',REVERSE(post_url))+1) -1) as name,
left(post_name,LOCATE('-',post_url,LOCATE('-',post_url)+1) - 1) as city,
post_title as original,ID,post_name,count(*)
FROM table WHERE post_type='finder'
GROUP BY name,city having count(*) > 1
To explain the query, post_url is basically a url name, ending with the name of someone, eg : new-jersey-something-something-donald-t 为了解释查询,post_url基本上是一个URL名称,以某人的名字结尾,例如:new-jersey-something-something-donald-t
I go to the second dash from the right and get the name that way. 我从右边的第二个破折号开始,以这种方式获得名称。 Then I get the city/state which is in the second dash from the left.
然后,我得到的城市/州位于左数第二个破折号中。 In this manner, I've successfully found the duplicates in this database-but I'm having trouble thinking of a way to isolate the duplicate and delete it.
通过这种方式,我已经成功地在数据库中找到了重复项,但是我在想办法隔离并删除重复项时遇到了麻烦。 In addition, I only want to delete the copy that does not have %near% in post_url.
另外,我只想删除post_url中没有 %near%的副本。 my question is, using the query here, how would I change this to delete the duplicate?
我的问题是,使用此处的查询,我将如何更改它以删除重复项?
You're not going to be able to do it in one query. 您将无法在一个查询中做到这一点。 That's because you need to write a query that looks something like this:
那是因为您需要编写一个看起来像这样的查询:
DELETE FROM table
WHERE id IN (SELECT ... FROM table WHERE ...)
MySQL specifically prohibits this. MySQL特别禁止这样做。 You can't delete based on a subquery that references the same table.
您不能基于引用同一表的子查询来删除。 You also can't rewrite this query using
JOIN
s. 您也无法使用
JOIN
重写此查询。
There is an easy solution, though: use a temporary table and two queries. 但是,有一个简单的解决方案:使用一个临时表和两个查询。
-- build the list of IDs to delete
CREATE TEMPORARY TABLE temp
SELECT ... FROM table WHERE ...
-- now delete those items
DELETE FROM table
WHERE id IN (SELECT id FROM temp);
You can improve performance with JOIN
s and indexes. 您可以使用
JOIN
和索引来提高性能。
The key to "isolating" the duplicates is to ensure that every item you want to delete has a primary key - that way you can easily build a list of IDs to delete. “隔离”重复项的关键是确保要删除的每个项目都有一个主键-这样,您就可以轻松构建要删除的ID列表。 If your table don't have primary keys, you are reduced to doing
WHERE
clauses and JOIN
s on multiple columns - that gets messy very quickly. 如果您的表没有主键,那么您可以减少对多列执行
WHERE
子句和JOIN
的操作-很快就会变得混乱。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.