简体繁体 English

从mysql innodb中删除大量数据

[英]Deleting huge chunks of data from mysql innodb

原文 2010-05-12 04:28:31 9 4 sql/ mysql/ innodb

I need to delete a huge chunk of my data in my production database, which runs about 100GB in size. 我需要在生产数据库中删除大量数据，该数据库的大小约为100GB。 If possible, i would like to minimize my downtime. 如果可能的话，我想尽量减少停机时间。

My selection criteria for deleting is likely to be 我的删除选择标准很可能是

DELETE * FROM POSTING WHERE USER.ID=5 AND UPDATED_AT<100 DELETE * FROM POSTING WHERE USER.ID = 5 AND UPDATED_AT <100

What is the best way to delete it? 删除它的最佳方法是什么？

Build an index? 建立索引？
Write a sequential script that deletes via paginating through the rows 1000 at a time? 编写一个顺序脚本，一次通过行1000分页删除？

4 个解决方案

You can try to use method mentioned in mysql doc : 您可以尝试使用mysql doc中提到的方法：

Select the rows not to be deleted into an empty table that has the same structure as the original table: 选择不要删除的行到与原始表具有相同结构的空表中：
INSERT INTO t_copy SELECT * FROM t WHERE ... ; INSERT INTO t_copy SELECT * FROM t WHERE ...;
Use RENAME TABLE to atomically move the original table out of the way and rename the copy to the original name: 使用RENAME TABLE以原子方式移动原始表并将副本重命名为原始名称：
RENAME TABLE t TO t_old, t_copy TO t; RENAME TABLE t TO t_old，t_copy TO t;
Drop the original table: 删除原始表：
DROP TABLE t_old; DROP TABLE t_old;

If at all possible use row level binary logging rather than statement level binary logging (it reduces the number of locks) at least during this operation. 如果可能的话，至少在此操作期间，使用行级二进制日志记录而不是语句级二进制日志记录（它减少了锁的数量）。 Perform your deletes in batches (1000 is a decent size). 批量执行删除（1000是一个体面的大小）。 Use the primary key as a criteria to delete each batch and order by the primary key (so that you delete rows that are physically close to each other). 使用主键作为条件，通过主键删除每个批次和顺序（以便删除彼此物理上相近的行）。

The best way is to delete incrementally by using LIMIT clause (by 10000 items), but do not apply ordering. 最好的方法是使用LIMIT子句（10000项）逐步删除，但不应用排序。 This will allow MySQL to flush the results more often and the transtactions won't be huge. 这将允许MySQL更频繁地刷新结果，并且转换不会很大。 You can easily do it with any programming language you have installed which has a connector to mysql. 您可以使用已安装的任何具有mysql连接器的编程语言轻松完成此操作。 Be sure to commit after each statement. 务必在每个声明后提交。

An index will definitely help but building it will take a while on a 100 GB table as well (anyway it is worth creating, when you are going to reuse the index in future). 一个索引肯定会有所帮助，但是在100 GB的表上构建它也需要一段时间（无论如何，当你将来要重用索引时，它是值得创建的）。 By the way, your current query is incorrect because reference a table USER not listed here. 顺便说一句，您当前的查询是不正确的，因为引用了此处未列出的表USER。 You should be careful with the index, so that the optimizer might benefit from using it. 您应该小心索引，以便优化器可以从使用它中受益。

A while back I wanted to delete over 99% of data from a table. 前段时间我想从表中删除99％以上的数据。 The table I was deleting was a sessions table that had over 250 million rows and I only wanted the most recent 500K. 我正在删除的表是一个有超过2.5亿行的会话表，我只想要最近的500K。 The fastest way I came up with was to select the 500,000 rows that I wanted into another table. 我想出的最快的方法是在另一个表中选择我想要的500,000行。 Delete the old table and rename the new table to replace the deleted one. 删除旧表并重命名新表以替换已删除的表。 This was about 100 times faster than doing a regular delete that has to choose records and rebuild the table. 这比执行必须选择记录和重建表的常规删除快约100倍。

This also has an added benefit of reducing the table file size if you're using InnoDB with innodb_file_per_table = 1 because InnoDB tables never shrink. 如果您使用InnoDB和innodb_file_per_table = 1，这还有一个减少表文件大小的额外好处，因为InnoDB表永远不会缩小。