简体   繁体   English

sql删除除2个重复项外的所有重复项

[英]sql delete all but 2 duplicates

I want to be able to limit the amount of duplicate records in a mySQL database table to 2. 我希望能够将mySQL数据库表中的重复记录数量限制为2。

(Excluding the id field which is auto increment) (不包括自动递增的id字段)

My table is set up like 我的桌子摆起来像

id    city      item
---------------------
1     Miami     4
2     Detroit   5
3     Miami     4
4     Miami     18
5     Miami     4

So in that table, only row 5 would be deleted. 因此,在该表中,将仅删除第5行。

How can I do this? 我怎样才能做到这一点?

MySQL has some foibles when reading and writing to the same table. 当读取和写入同一张表时,MySQL有一些缺点。 So I don't actually know if this will work, the syntax is fine in many implementations of SQL, but I don't know if it's MySQL friendly... 所以我实际上不知道这是否行得通,在许多SQL实现中语法都不错,但是我不知道它是否对MySQL友好...

DELETE
  yourTable
WHERE
  1 < (SELECT COUNT(*)
       FROM yourTable as Lookup
       WHERE city = yourTable.city AND item = yourTable.item AND id < yourTable.id)

EDIT 编辑

Amazingly convoluted, but worth a try? 令人费解的令人费解,但值得一试吗?

DELETE
  yourTable
FROM
  yourTable
INNER JOIN
(
  SELECT
    id
  FROM
  (
    SELECT
      id
    FROM
      yourTable
    WHERE
      1 < (SELECT COUNT(*)
           FROM yourTable as Lookup
           WHERE city = yourTable.city AND item = yourTable.item AND id < yourTable.id)
  )
    AS inner_deletes
)
  AS deletes
    ON deletes.id = yourTable.id

我认为您的问题在于,您的代码和/或表结构都允许插入重复项,并且您在问何时应该真正修复数据库和/或代码。

i think a better solution is avoid allow more than 5 registers, you have to implement a validation where if select count(*) > 3 you will not accept the new insert. 我认为一个更好的解决方案是避免允许超过5个寄存器,您必须实施一个验证,如果select count(*)> 3,则您将不接受新插入。

because if you want to do this into the data tier, you have to use a stored procedure , because first you need to identify all the register with more than 3 registers and delete only the last . 因为如果要在数据层中执行此操作,则必须使用存储过程,因为首先需要标识具有3个以上寄存器的所有寄存器,而仅删除last。 Saludos Saludos

Due to MySQL being notoriously difficult when it comes to updating queried tables (see for example the answers from Dems), the best I can figure out is sadly more than one statement but on the plus side fairly readable; 由于MySQL在更新查询表时非常困难(例如,参见Dems的答案),因此我能弄清的最好的就是一个以上的语句,但从正面看还是很容易理解的。

CREATE TEMPORARY TABLE Dump AS SELECT id FROM table1 WHERE id NOT IN 
  (SELECT MIN(id) FROM table1 GROUP BY city,item UNION
   SELECT MAX(id) FROM table1 GROUP BY city,item);

DELETE FROM table1 where id in (select * from Dump);

DROP TABLE DUMP;

Not sure if it was important which duplicate was removed, this keeps the first and last. 不知道删除哪个重复是否很重要,这将保留第一个和最后一个。

In your reply to Joachim's answer, you ask about saving 3 or 5 rows, this is one way to accomplish it. 在答复Joachim的答案时,您询问保存3或5行,这是完成此操作的一种方法。 Depending on how you are using this database, you could either call this in a loop, or you could turn it into a stored procedure. 根据使用该数据库的方式,您可以循环调用此方法,也可以将其转换为存储过程。 Either way, you would continue to run this entire block of code until Rows Affected = 0: 无论哪种方式,您都将继续运行整个代码块,直到受影响的行= 0:

drop table if exists TempTable;
create table TempTable
select city, item,
       count(*) as record_count, 
       min(id) as ItemToDrop -- this could be changed to max() if you 
                             -- want to delete new stuff instead 
from YourTable
group by city, item
having count(*) > 2; -- This value = number of rows you save

delete from YourTable
where id in (select ItemToDrop from TempTable);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM