[英]Purging data from mysql tables
I have a cron setup to take a backup of production mysql tables and looking to purge data from the tables at regular intervals. 我有一个cron设置,用于备份生产mysql表,并希望定期清除表中的数据。 I have to delete data across multiple tables referenced by ids.
我必须删除ID引用的多个表中的数据。
Some background : I need to delete about 2 million rows and my app will be continuously reading/writing to my db(it shouldn't usually access the rows being deleted though) 一些背景:我需要删除大约200万行,并且我的应用程序将不断读取/写入我的数据库(尽管它通常不应该访问要删除的行)
My question is how should I structure my delete query on the following parameters : 我的问题是如何在以下参数上构造删除查询:
Assumption: 假设:
Delete query which you have is based on range and not primary index. 删除基于范围而不是主索引的查询。
Deleting all rows in one transaction, Will have very long transaction, and a larger locks. 删除一个事务中的所有行,将具有很长的事务和更大的锁。 This ll increase replication lag, replication lag is bad, new DC makes it really bad.
这会增加复制滞后,复制滞后不好,新的DC使其变得很糟糕。 Having larger locks also will reduce your write throughput.
拥有更大的锁也将降低您的写入吞吐量。 (In case of Isolation Level Serializable even reads throughput might also suffer.)
(在隔离级别可序列化的情况下,甚至读取吞吐量也可能会受到影响。)
Deleting in batch. 批量删除。 Better than deleting all, but as deletes are happening for range, number of locks for each delete will be more, (will take gap locks and next row locks).
比全部删除要好,但是随着范围的删除发生,每次删除的锁数会更多(将使用间隙锁和下一行锁)。 So delete in batch on range will also have same problems just smaller.
因此,按范围批量删除也将具有较小的相同问题。
Compared to delete in all and batch, doing it in batch is preferable. 与全部删除相比,批处理更可取。
Other way of doing : (We need to delete rows before some-time ) 1. Have a daemon which runs every configured_time and. 其他方法:(我们需要在某个时间之前删除行)1.有一个守护进程,它运行每个configure_time和。 i.
一世。 select pk from table where purge-time < your-purge-time.
从表中选择pk,其中清除时间<您的清除时间。 -- no locks ii.
-无锁ii。 delete based on pk, using multiple threads.
使用多个线程基于pk删除。 -- row level locks, small transaction (across tables.)
-行级锁,小事务(跨表)。
This approach will ensure smaller transaction and only row level locks. 这种方法将确保较小的事务,并且仅行级锁。 (delete based on primary key would only take row level locks).
(基于主键的删除只会采用行级锁)。 Also your query is simple so you can re run even when part of deletes are successful.
您的查询也很简单,因此即使部分删除成功,您也可以重新运行。 And I feel having these atomic is not a requirement.
我觉得这些原子不是必需的。
Or 要么
Or 要么
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.