简体   繁体   中英

bulk update to an unindexed column in a large InnoDB table

I have an InnoDB table in a Mysql 5.1 database that has about 27 million rows. This table has three unindexed mediumint unsigned columns that I want to be able to periodically, globally reset to "0". For example:

update myTable set countA = 0;

This very simple update query is hitting up against trouble with InnoDB's row-level locking. After locking too many rows, the update query fails with the well-documented error:

ERROR 1206 (HY000): The total number of locks exceeds the lock table size

The problem is that with such a large table the number of individual row locks has exceeded the space allocated for storing locks.

I have found a few suggestions for how to deal with this issue:

Lock the whole table to turn off row-locking
This seemed like the best, cleanest solution, and I have no problem with this particular table being locked up for a few minutes during these infrequent operations. The problem is, the given solution didn't actually work for me. Maybe it's something that use to work with older versions of Mysql?

Increase the size of the lock buffer
By increasing the value of the Mysql variable innodb_buffer_pool_size , we can create more room for row locks. I'm extremely uncomfortable with this solution because even if I can allocate sufficient space, I'm setting myself up for failure as my tables grow. Also, it seems like a poor setup that requires the creation of gigabytes of arguably unnecessary locks.

Index the affected columns (see comments)
If we are doing a bulk update to a single column that is supported by an appropriate index, then InnoDB can avoid locking all the rows. By using the index, it can lock only the affected rows. I actually tried this out, but found that managing these three indexes made my incremental updates a lot slower. Since I will have tens of millions of update queries adjusting these three counts for every instance of needing to reset the counts I don't want to sacrifice the efficiency of the incremental updates.

Update the column in batches
The source document describes this as a work-around, but I found that it was very effective up to a point:

update myTable set countA = 0 where countA != 0 limit 500000;

By doing this repeatedly until the number of affected rows is less than the specified limit , all of the rows get updated. This solution broke down for me on particularly large tables, as the number of rows that could be updated in a single iteration drops sharply as Mysql has to look further for matching rows. By the time 1,000 rows updated was too many for one execution I still had millions of non-zero values to update.

So what possibilities do I have left?

  1. Stop using InnoDB: This would require some other reorganization of my current processes, but is something I would consider.
  2. Move count columns out of main table: If I have a CountA table, then I could reset the counts by using delete from CountA and I could retrieve the counts with an inner join against the main table. This would slow down my updates to individual counts as I would have to get the id from the main table before conditionally updating or inserting a row in the CountA table. Not great, but something I would consider.
  3. Something else that is both a clean solution and one that can be expected to grow reasonably well with my tables?

Update: With the help of the accepted response, I now have a batch-processing implementation which gets to job done in about five minutes. Though I would prefer that batch processing wouldn't be necessary, until a more direct solution comes around it seems to be. In case it helps the next person to stumble over this question, here's my related Java JDBC code. (The blog post linked from the accepted answer is recommended reading too.)

    int batchsize = 10_000;
    PreparedStatement pstmt = connection.prepareStatement
            ("UPDATE tableName SET countA = 0, countB = 0, countC = 0 "
                       + "WHERE id BETWEEN ? AND ?");
    for (int left = 0; left < maxId; left += batchsize) {
        pstmt.setInt(1, left + 1);
        pstmt.setInt(2, left + batchsize);
        pstmt.executeUpdate();
    }
    pstmt.close();

Plan A

I like chunking (batching). However, your sketch of code is not very efficient. Adding OFFSET does not help. Instead, see my blog about walking through the table carefully . That is find 'next' 100-1000 rows; perform the UPDATE ; loop. (Note: each chunk should be its own transaction.)

The technique for "finding the next N rows and remembering where you left off" depends on the PRIMARY KEY . My blog covers most scenarios (numeric, string, sparse, etc). (The blog talks about DELETE , but should be easily adaptable to UPDATE .)

InnoDB is beneficial for chunking because the PRIMARY KEY is clustered. Hence, each chunk would have to read a minimal number of blocks.

Plan B

Using a parallel table ("move count columns out of main table") is probably a good idea because the number of disk blocks to touch would be less, hence is can be similar to Plan A, but faster. Use the same PRIMARY KEY (sans AUTO_INCREMENT ).

Plan C

(1) parallel table (like Plan B), plus (2) a missing row implies values=0. Then, clearing is achieved via TRUNCATE TABLE (unlike Plan A). Since you have three columns to clear out, the rules would be

  • When any value is changed to non-zero, make sure the row exists in the parallel table, and set the value as needed (plus zeros for the others). Probably INSERT ... ON DUPLICATE KEY UPDATE... is optimal.
  • When looking for the values ( SELECT ), do a LEFT JOIN and IFNULL(col, 0) to get the value or 0.

Plan X (non-starter)

Indexing the columns would hurt -- When you UPDATE an indexed column, both the data and the index must be changed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM