简体   繁体   中英

Purging data from mysql tables

I have a cron setup to take a backup of production mysql tables and looking to purge data from the tables at regular intervals. I have to delete data across multiple tables referenced by ids.

Some background : I need to delete about 2 million rows and my app will be continuously reading/writing to my db(it shouldn't usually access the rows being deleted though)

My question is how should I structure my delete query on the following parameters :

  1. Delete in a single bulk query vs deleting in batches ?
  2. Delete across different tables in a single transaction vs deleting without using any transaction. Will there be any table level locks if I use delete in transactions even if I delete in batches?
  3. I do not have any partitions set up, would fragmentation be an issue?

Assumption:

  1. Isolation level : Repeatable Read -- Default Mysql Isolation Level.
  2. Delete query which you have is based on range and not primary index.

  3. Deleting all rows in one transaction, Will have very long transaction, and a larger locks. This ll increase replication lag, replication lag is bad, new DC makes it really bad. Having larger locks also will reduce your write throughput. (In case of Isolation Level Serializable even reads throughput might also suffer.)

  4. Deleting in batch. Better than deleting all, but as deletes are happening for range, number of locks for each delete will be more, (will take gap locks and next row locks). So delete in batch on range will also have same problems just smaller.

Compared to delete in all and batch, doing it in batch is preferable.

Other way of doing : (We need to delete rows before some-time ) 1. Have a daemon which runs every configured_time and. i. select pk from table where purge-time < your-purge-time. -- no locks ii. delete based on pk, using multiple threads. -- row level locks, small transaction (across tables.)

This approach will ensure smaller transaction and only row level locks. (delete based on primary key would only take row level locks). Also your query is simple so you can re run even when part of deletes are successful. And I feel having these atomic is not a requirement.

Or

  1. Reduce your isolation level : To READ_COMMITED then even, with batch deletes you should be fine. In Read COMMITED isolations, locks are only on row even while accessing via secondary key.

Or

  1. If your model allows shard based on time and drop the db itself :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM