I have a cron setup to take a backup of production mysql tables and looking to purge data from the tables at regular intervals. I have to delete data across multiple tables referenced by ids.
Some background : I need to delete about 2 million rows and my app will be continuously reading/writing to my db(it shouldn't usually access the rows being deleted though)
My question is how should I structure my delete query on the following parameters :
Assumption:
Delete query which you have is based on range and not primary index.
Deleting all rows in one transaction, Will have very long transaction, and a larger locks. This ll increase replication lag, replication lag is bad, new DC makes it really bad. Having larger locks also will reduce your write throughput. (In case of Isolation Level Serializable even reads throughput might also suffer.)
Deleting in batch. Better than deleting all, but as deletes are happening for range, number of locks for each delete will be more, (will take gap locks and next row locks). So delete in batch on range will also have same problems just smaller.
Compared to delete in all and batch, doing it in batch is preferable.
Other way of doing : (We need to delete rows before some-time ) 1. Have a daemon which runs every configured_time and. i. select pk from table where purge-time < your-purge-time. -- no locks ii. delete based on pk, using multiple threads. -- row level locks, small transaction (across tables.)
This approach will ensure smaller transaction and only row level locks. (delete based on primary key would only take row level locks). Also your query is simple so you can re run even when part of deletes are successful. And I feel having these atomic is not a requirement.
Or
Or
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.