简体   繁体   English

MySQL Inndob从非常大的数据库中删除/清除行

[英]MySQL Inndob Delete/Purge rows from very large databases

I am having some issues with deleting data from innodb tables, from what I am reading most people are saying the only way to free up space is to export the wanted data create a new tale and import it.. this seems a very rubbish way of doing it, especially on a data which is nearly 3tbs. 我从innodb表中删除数据时遇到了一些问题,据我所读,大多数人说,释放空间的唯一方法是导出所需的数据以创建新故事并导入..这似乎是一种非常垃圾的方式这样做,尤其是在接近3tbs的数据上。

The issue I am having is deleting data older then 3 months to try and free up disk space, once the data is deleted the disk space does not seem to be freed up. 我遇到的问题是删除早于3个月的数据以尝试释放磁盘空间,一旦删除数据,磁盘空间似乎就无法释放。 Is there a way to purge or permanently delete rows/data to free up disk space? 有没有办法清除或永久删除行/数据以释放磁盘空间?

Is there a more reliable way without dropping the database and restarting the service to free up disk space. 是否有一种更可靠的方法,而无需删除数据库并重新启动服务以释放磁盘空间。

Please could some body advise me on the best approach to handling deletion of large database. 请某些机构就处理大型数据库删除的最佳方法向我提出建议。

Much appreciate your time in advanced. 非常感谢您的进阶时间。

Thanks :) 谢谢 :)

One relatively efficient approach is using database partitions and dropping old data by deleting partitions. 一种相对有效的方法是使用数据库分区,并通过删除分区来删除旧数据。 It certainly requires more complicated maintenance, but it does work. 当然,它需要更复杂的维护,但确实可以。

First, enable innodb_file_per_table so that each table (and partition) goes to its own file instead of a single huge ibdata file. 首先,启用innodb_file_per_table,以便每个表(和分区)转到其自己的文件,而不是单个巨大的ibdata文件。

Then, create a partitioned table, having one partition per range of time (day, month, week, you pick it), which results in files of some sensible size for your data set. 然后,创建一个分区表,每个时间范围(您可以选择日,月,周)有一个分区,这将导致数据集的文件大小合理。

create table foo(     
        tid INT(7) UNSIGNED NOT NULL,
        yearmonth INT(6) UNSIGNED NOT NULL,
        data varbinary(255) NOT NULL,
        PRIMARY KEY (tid, yearmonth) 
) engine=InnoDB
PARTITION BY RANGE(yearmonth) (
        PARTITION p201304 VALUES LESS THAN (201304),
        PARTITION p201305 VALUES LESS THAN (201305),
        PARTITION p201306 VALUES LESS THAN (201306)
);

Looking in the database data directory you'll find a file for each partition. 在数据库数据目录中查找,您将找到每个分区的文件。 In this example, partition 'p201304' will contain all rows having yearmonth < 201304, 'p201305' will have rows for 2013-04, 'p201306' will contain all rows for 2013-05. 在此示例中,分区“ p201304”将包含年月<201304的所有行,“ p201305”将具有2013-04的行,“ p201306”将包含2013-05的所有行。

In practice I have actually used an integer column containing an UNIX timestamp as the partitioning key - that way it's easier to adjust the size of the partitions as time goes by. 实际上,我实际上使用了一个包含UNIX时间戳记的整数列作为分区键-这样,随着时间的流逝,更容易调整分区的大小。 The partition edges do not need to match any calendar boundaries, they can happen every 100000 seconds or whatever results in a sensible amount of partitions (tens of partitions) while still having small enough files with your data. 分区边缘不需要匹配任何日历边界,它们可以每100000秒发生一次,或者任何导致合理数量的分区(数十个分区)的情况发生,而数据中仍然包含足够小的文件。

Then, set up a maintenance process which creates new partitions for new data: ALTER TABLE foo ADD PARTITION (PARTITION p201307 VALUES LESS THAN (201307)) and deletes old partitions: ALTER TABLE foo DROP PARTITION p201304 . 然后,设置维护过程,为新数据创建新分区: ALTER TABLE foo ADD PARTITION (PARTITION p201307 VALUES LESS THAN (201307))并删除旧分区: ALTER TABLE foo DROP PARTITION p201304 Deletion of a large partition is almost as fast as deleting the file, and it'll actually free up disk space. 删除大分区几乎与删除文件一样快,并且实际上将释放磁盘空间。 Also, it won't fragment the other partitions by leaving empty space scattered inside them. 而且,它不会通过在其他分区中留有空白空间来使其他分区破碎。

If possible, make sure your frequent queries only access one or a few partitions by specifying the partition key (yearmonth in the example above), or a range of it, in the WHERE clause - that'll make them run much faster as the database won't need to look inside all the partitions to find your data. 如果可能的话,通过在WHERE子句中指定分区键(在上面的示例中为yearmonth)或其范围,确保您的频繁查询仅访问一个或几个分区-这将使它们的运行速度比数据库快得多无需查看所有分区即可找到您的数据。

Even if you use the file_per_table option you will still have this issue. 即使您使用file_per_table选项,您仍然会遇到此问题。 The only way to "fix" it is to rebuild individual tables: “修复”的唯一方法是重建单个表:

OPTIMIZE TABLE bloated_table

Note that this will lock the table during the rebuild operation, and you must have enough free space to accommodate the new table. 请注意,这将在重建操作期间锁定表,并且您必须有足够的可用空间来容纳新表。 On some systems this is impractical. 在某些系统上,这是不切实际的。

If you're frequently deleting data, you probably need to rotate the entire table periodically. 如果您经常删除数据,则可能需要定期旋转整个表。 Dropping a table under InnoDB with file_per_table will liberate the disk space almost immediately. 使用file_per_table在InnoDB下删除表将几乎立即释放磁盘空间。 If you have one table per month, you can simply drop tables representing data from three months ago. 如果您每月只有一张表,则只需删除代表三个月前数据的表即可。

Is it ugly to work with these? 与这些一起工作是否难看? Yes. 是。 Is there an alternative? 还有其他选择吗? Not really. 并不是的。 You can try going down the table partitioning rabbit hole, but that often ends up more trouble than it's worth. 您可以尝试将表划分为兔子洞,但这通常会带来更多麻烦,而不是值得的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM