简体   繁体   English

我有一个查询,该查询在我的SQL数据库中找到重复项-现在如何删除所说的重复项?

[英]I have a query that finds duplicates in my SQL database-now how do I delete said duplicates?

I have an sql query that finds and groups these duplicates using very complicated conditions: 我有一个SQL查询,它使用非常复杂的条件来查找和分组这些重复项:

SELECT right(post_url, LOCATE('-', REVERSE(post_url),LOCATE('-',REVERSE(post_url))+1) -1) as name,
left(post_name,LOCATE('-',post_url,LOCATE('-',post_url)+1) - 1) as city,
post_title as original,ID,post_name,count(*) 
FROM table WHERE post_type='finder' 
GROUP BY name,city having count(*) > 1

To explain the query, post_url is basically a url name, ending with the name of someone, eg : new-jersey-something-something-donald-t 为了解释查询,post_url基本上是一个URL名称,以某人的名字结尾,例如:new-jersey-something-something-donald-t

I go to the second dash from the right and get the name that way. 我从右边的第二个破折号开始,以这种方式获得名称。 Then I get the city/state which is in the second dash from the left. 然后,我得到的城市/州位于左数第二个破折号中。 In this manner, I've successfully found the duplicates in this database-but I'm having trouble thinking of a way to isolate the duplicate and delete it. 通过这种方式,我已经成功地在数据库中找到了重复项,但是我在想办法隔离并删除重复项时遇到了麻烦。 In addition, I only want to delete the copy that does not have %near% in post_url. 另外,我只想删除post_url中没有 %near%的副本。 my question is, using the query here, how would I change this to delete the duplicate? 我的问题是,使用此处的查询,我将如何更改它以删除重复项?

You're not going to be able to do it in one query. 您将无法在一个查询中做到这一点。 That's because you need to write a query that looks something like this: 那是因为您需要编写一个看起来像这样的查询:

DELETE FROM table
WHERE id IN (SELECT ... FROM table WHERE ...)

MySQL specifically prohibits this. MySQL特别禁止这样做。 You can't delete based on a subquery that references the same table. 您不能基于引用同一表的子查询来删除。 You also can't rewrite this query using JOIN s. 您也无法使用JOIN重写此查询。

There is an easy solution, though: use a temporary table and two queries. 但是,有一个简单的解决方案:使用一个临时表和两个查询。

-- build the list of IDs to delete
CREATE TEMPORARY TABLE temp
SELECT ... FROM table WHERE ...

-- now delete those items
DELETE FROM table
WHERE id IN (SELECT id FROM temp);

You can improve performance with JOIN s and indexes. 您可以使用JOIN和索引来提高性能。

The key to "isolating" the duplicates is to ensure that every item you want to delete has a primary key - that way you can easily build a list of IDs to delete. “隔离”重复项的关键是确保要删除的每个项目都有一个主键-这样,您就可以轻松构建要删除的ID列表。 If your table don't have primary keys, you are reduced to doing WHERE clauses and JOIN s on multiple columns - that gets messy very quickly. 如果您的表没有主键,那么您可以减少对多列执行WHERE子句和JOIN的操作-很快就会变得混乱。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM