简体   繁体   English

SQL 查询:删除表中除最新N条记录外的所有记录?

[英]SQL query: Delete all records from the table except latest N?

Is it possible to build a single mysql query (without variables) to remove all records from the table, except latest N (sorted by id desc)?是否可以构建单个 mysql 查询(不带变量)以从表中删除所有记录,除了最新的 N(按 id desc 排序)?

Something like this, only it doesn't work:)像这样的东西,只是它不起作用:)

delete from table order by id ASC limit ((select count(*) from table ) - N)

Thanks.谢谢。

You cannot delete the records that way, the main issue being that you cannot use a subquery to specify the value of a LIMIT clause.您不能以这种方式删除记录,主要问题是您不能使用子查询来指定 LIMIT 子句的值。

This works (tested in MySQL 5.0.67):这有效(在 MySQL 5.0.67 中测试):

DELETE FROM `table`
WHERE id NOT IN (
  SELECT id
  FROM (
    SELECT id
    FROM `table`
    ORDER BY id DESC
    LIMIT 42 -- keep this many records
  ) foo
);

The intermediate subquery is required.中间子查询必需的。 Without it we'd run into two errors:没有它,我们会遇到两个错误:

  1. SQL Error (1093): You can't specify target table 'table' for update in FROM clause - MySQL doesn't allow you to refer to the table you are deleting from within a direct subquery. SQL 错误 (1093):您不能在 FROM 子句中为更新指定目标表“表” - MySQL 不允许您在直接子查询中引用您要删除的表。
  2. SQL Error (1235): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' - You can't use the LIMIT clause within a direct subquery of a NOT IN operator. SQL 错误 (1235):此版本的 MySQL 尚不支持“LIMIT & IN/ALL/ANY/SOME 子查询” -您不能在 NOT IN 运算符的直接子查询中使用 LIMIT 子句。

Fortunately, using an intermediate subquery allows us to bypass both of these limitations.幸运的是,使用中间子查询允许我们绕过这两个限制。


Nicole has pointed out this query can be optimised significantly for certain use cases (such as this one). Nicole 指出,对于某些用例(例如这个),可以显着优化此查询。 I recommend reading that answer as well to see if it fits yours.我建议您也阅读该答案,看看它是否适合您。

I know I'm resurrecting quite an old question, but I recently ran into this issue, but needed something that scales to large numbers well .我知道我正在重新提出一个相当古老的问题,但我最近遇到了这个问题,但需要一些可以很好地扩展到大量数字的东西。 There wasn't any existing performance data, and since this question has had quite a bit of attention, I thought I'd post what I found.没有任何现有的性能数据,并且由于这个问题引起了相当多的关注,我想我会发布我发现的内容。

The solutions that actually worked were the Alex Barrett's double sub-query/ NOT IN method (similar to Bill Karwin's ), and Quassnoi's LEFT JOIN method.实际有效的解决方案是Alex Barrett 的双重子查询/ NOT IN方法(类似于Bill Karwin 的)和Quassnoi 的LEFT JOIN方法。

Unfortunately both of the above methods create very large intermediate temporary tables and performance degrades quickly as the number of records not being deleted gets large.遗憾的是上述两种方法创建非常大的中间临时表,性能下降很快被删除的记录数变大。

What I settled on utilizes Alex Barrett's double sub-query (thanks!) but uses <= instead of NOT IN :我决定使用 Alex Barrett 的双重子查询(谢谢!)但使用<=而不是NOT IN

DELETE FROM `test_sandbox`
  WHERE id <= (
    SELECT id
    FROM (
      SELECT id
      FROM `test_sandbox`
      ORDER BY id DESC
      LIMIT 1 OFFSET 42 -- keep this many records
    ) foo
  )

It uses OFFSET to get the id of the N th record and deletes that record and all previous records.它使用OFFSET来获取第N 条记录的 id 并删除该记录和所有以前的记录。

Since ordering is already an assumption of this problem ( ORDER BY id DESC ), <= is a perfect fit.由于排序已经是这个问题的假设( ORDER BY id DESC ), <=是一个完美的选择。

It is much faster, since the temporary table generated by the subquery contains just one record instead of N records.它要快得多,因为子查询生成的临时表只包含一条记录而不是N条记录。

Test case测试用例

I tested the three working methods and the new method above in two test cases.我在两个测试用例中测试了三种工作方法和上面的新方法。

Both test cases use 10000 existing rows, while the first test keeps 9000 (deletes the oldest 1000) and the second test keeps 50 (deletes the oldest 9950).两个测试用例都使用 10000 个现有行,而第一个测试保留 9000(删除最旧的 1000),第二个测试保留 50(删除最旧的 9950)。

+-----------+------------------------+----------------------+
|           | 10000 TOTAL, KEEP 9000 | 10000 TOTAL, KEEP 50 |
+-----------+------------------------+----------------------+
| NOT IN    |         3.2542 seconds |       0.1629 seconds |
| NOT IN v2 |         4.5863 seconds |       0.1650 seconds |
| <=,OFFSET |         0.0204 seconds |       0.1076 seconds |
+-----------+------------------------+----------------------+

What's interesting is that the <= method sees better performance across the board, but actually gets better the more you keep, instead of worse.有趣的是, <=方法在整体上看到了更好的性能,但实际上,你保留的越多,性能越好,而不是更糟。

Unfortunately for all the answers given by other folks, you can't DELETE and SELECT from a given table in the same query.不幸的是,对于其他人给出的所有答案,您不能在同一个查询中从给定的表中DELETESELECT

DELETE FROM mytable WHERE id NOT IN (SELECT MAX(id) FROM mytable);

ERROR 1093 (HY000): You can't specify target table 'mytable' for update 
in FROM clause

Nor can MySQL support LIMIT in a subquery. MySQL 也不能在子查询中支持LIMIT These are limitations of MySQL.这些是 MySQL 的局限性。

DELETE FROM mytable WHERE id NOT IN 
  (SELECT id FROM mytable ORDER BY id DESC LIMIT 1);

ERROR 1235 (42000): This version of MySQL doesn't yet support 
'LIMIT & IN/ALL/ANY/SOME subquery'

The best answer I can come up with is to do this in two stages:我能想出的最佳答案是分两个阶段进行:

SELECT id FROM mytable ORDER BY id DESC LIMIT n; 

Collect the id's and make them into a comma-separated string:收集 id 并将它们变成逗号分隔的字符串:

DELETE FROM mytable WHERE id NOT IN ( ...comma-separated string... );

(Normally interpolating a comma-separate list into an SQL statement introduces some risk of SQL injection, but in this case the values are not coming from an untrusted source, they are known to be integer values from the database itself.) (通常将逗号分隔列表插入 SQL 语句会带来一些 SQL 注入风险,但在这种情况下,这些值并非来自不受信任的来源,它们是来自数据库本身的整数值。)

note: Though this doesn't get the job done in a single query, sometimes a more simple, get-it-done solution is the most effective.注意:虽然这不能在单个查询中完成工作,但有时更简单、一劳永逸的解决方案是最有效的。

DELETE  i1.*
FROM    items i1
LEFT JOIN
        (
        SELECT  id
        FROM    items ii
        ORDER BY
                id DESC
        LIMIT 20
        ) i2
ON      i1.id = i2.id
WHERE   i2.id IS NULL

如果你的 id 是增量的,那么使用类似的东西

delete from table where id < (select max(id) from table)-N

To delete all the records except te last N you may use the query reported below.要删除除 te last N之外的所有记录,您可以使用下面报告的查询。

It's a single query but with many statements so it's actually not a single query the way it was intended in the original question.这是一个单一的查询,但有很多语句,所以它实际上不是原始问题中预期的单一查询

Also you need a variable and a built-in (in the query) prepared statement due to a bug in MySQL.由于 MySQL 中的错误,您还需要一个变量和一个内置(在查询中)准备好的语句。

Hope it may be useful anyway...希望它无论如何可能有用......

nnn are the rows to keep and theTable is the table you're working on. nnn是要保留的行,而theTable是您正在处理的表。

I'm assuming you have an autoincrementing record named id我假设您有一个名为id的自动递增记录

SELECT @ROWS_TO_DELETE := COUNT(*) - nnn FROM `theTable`;
SELECT @ROWS_TO_DELETE := IF(@ROWS_TO_DELETE<0,0,@ROWS_TO_DELETE);
PREPARE STMT FROM "DELETE FROM `theTable` ORDER BY `id` ASC LIMIT ?";
EXECUTE STMT USING @ROWS_TO_DELETE;

The good thing about this approach is performance : I've tested the query on a local DB with about 13,000 record, keeping the last 1,000.这种方法的好处是性能:我已经在本地数据库上测试了大约 13,000 条记录的查询,保留了最后 1,000 条。 It runs in 0.08 seconds.它在 0.08 秒内运行。

The script from the accepted answer...已接受答案中的脚本...

DELETE FROM `table`
WHERE id NOT IN (
  SELECT id
  FROM (
    SELECT id
    FROM `table`
    ORDER BY id DESC
    LIMIT 42 -- keep this many records
  ) foo
);

Takes 0.55 seconds.需要 0.55 秒。 About 7 times more.大约 7 倍。

Test environment: mySQL 5.5.25 on a late 2011 i7 MacBookPro with SSD测试环境:mySQL 5.5.25 在 2011 年末 i7 MacBookPro 上使用 SSD

DELETE FROM table WHERE ID NOT IN
(SELECT MAX(ID) ID FROM table)

try below query:尝试以下查询:

DELETE FROM tablename WHERE id < (SELECT * FROM (SELECT (MAX(id)-10) FROM tablename ) AS a)

the inner sub query will return the top 10 value and the outer query will delete all the records except the top 10.内部子查询将返回前 10 个值,外部查询将删除除前 10 个之外的所有记录。

Just wanted to throw this into the mix for anyone using Microsoft SQL Server instead of MySQL.只是想为使用 Microsoft SQL Server 而不是 MySQL 的任何人加入这个组合。 The keyword 'Limit' isn't supported by MSSQL, so you'll need to use an alternative. MSSQL 不支持关键字“限制”,因此您需要使用替代方法。 This code worked in SQL 2008, and is based on this SO post.此代码适用于 SQL 2008,并基于此 SO 帖子。 https://stackoverflow.com/a/1104447/993856 https://stackoverflow.com/a/1104447/993856

-- Keep the last 10 most recent passwords for this user.
DECLARE @UserID int; SET @UserID = 1004
DECLARE @ThresholdID int -- Position of 10th password.
SELECT  @ThresholdID = UserPasswordHistoryID FROM
        (
            SELECT ROW_NUMBER()
            OVER (ORDER BY UserPasswordHistoryID DESC) AS RowNum, UserPasswordHistoryID
            FROM UserPasswordHistory
            WHERE UserID = @UserID
        ) sub
WHERE   (RowNum = 10) -- Keep this many records.

DELETE  UserPasswordHistory
WHERE   (UserID = @UserID)
        AND (UserPasswordHistoryID < @ThresholdID)

Admittedly, this is not elegant.诚然,这并不优雅。 If you're able to optimize this for Microsoft SQL, please share your solution.如果您能够针对 Microsoft SQL 对此进行优化,请分享您的解决方案。 Thanks!谢谢!

If you need to delete the records based on some other column as well, then here is a solution:如果您还需要删除基于其他列的记录,那么这里有一个解决方案:

DELETE
FROM articles
WHERE id IN
    (SELECT id
     FROM
       (SELECT id
        FROM articles
        WHERE user_id = :userId
        ORDER BY created_at DESC LIMIT 500, 10000000) abc)
  AND user_id = :userId
DELETE FROM table WHERE id NOT IN (
    SELECT id FROM table ORDER BY id, desc LIMIT 0, 10
)

This should work as well:这也应该有效:

DELETE FROM [table] 
INNER JOIN (
    SELECT [id] 
    FROM (
        SELECT [id] 
        FROM [table] 
        ORDER BY [id] DESC
        LIMIT N
    ) AS Temp
) AS Temp2 ON [table].[id] = [Temp2].[id]

What about :关于什么 :

SELECT * FROM table del 
         LEFT JOIN table keep
         ON del.id < keep.id
         GROUP BY del.* HAVING count(*) > N;

It returns rows with more than N rows before.它返回之前超过 N 行的行。 Could be useful ?可能有用吗?

Stumbled across this and thought I'd update.偶然发现了这个并认为我会更新。 This is a modification of something that was posted before .这是对之前发布的内容的修改。 I would have commented, but unfortunately don't have 50 reputation...我会发表评论,但不幸的是没有 50 的声誉......

LOCK Tables TestTable WRITE, TestTable as TestTableRead READ;
DELETE FROM TestTable
WHERE ID <= (
  SELECT ID
  FROM TestTable as TestTableRead -- (the 'as' declaration is required for some reason)
  ORDER BY ID DESC LIMIT 1 OFFSET 42 -- keep this many records);
UNLOCK TABLES;

The use of 'Where' and 'Offset' circumvents the sub-query. “Where”和“Offset”的使用绕过了子查询。 You also cannot read and write from the same table in the same query, as you may modify entries as they're being used.您也不能在同一个查询中从同一个表读取和写入,因为您可以在使用条目时修改它们。 The Locks allow to circumvent this.锁允许规避这一点。 This is also safe for parallel access to the database by other processes.这对于其他进程并行访问数据库也是安全的。 For performance and further explanation see the linked answer.有关性能和进一步说明,请参阅链接的答案。

Tested with mysql Ver 15.1 Distrib 10.5.18-MariaDB使用 mysql Ver 15.1 Distrib 10.5.18-MariaDB 进行测试

For further details on locks, see here有关锁的更多详细信息,请参见此处

Using id for this task is not an option in many cases.在许多情况下,使用 id 执行此任务不是一种选择。 For example - table with twitter statuses.例如 - 带有 twitter 状态的表。 Here is a variant with specified timestamp field.这是具有指定时间戳字段的变体。

delete from table 
where access_time >= 
(
    select access_time from  
    (
        select access_time from table 
            order by access_time limit 150000,1
    ) foo    
)

Answering this after a long time...Came across the same situation and instead of using the answers mentioned, I came with below -很长一段时间后回答这个问题......遇到了同样的情况,而不是使用提到的答案,我来到了下面 -

DELETE FROM table_name order by ID limit 10

This will delete the 1st 10 records and keep the latest records.这将删除前 10 条记录并保留最新记录。

Why not为什么不

DELETE FROM table ORDER BY id DESC LIMIT 1, 123456789

Just delete all but the first row (order is DESC!), using a very very large nummber as second LIMIT-argument.只需删除除第一行之外的所有行(顺序为 DESC!),使用非常大的数字作为第二个 LIMIT 参数。See here看这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM