简体繁体 English

在innoDB中强制隐藏的聚集索引

[英]Force hidden clustered index in innoDB

原文 2016-07-20 08:52:13 5 1 mysql/ sql/ innodb/ mariadb

I have a table with IDs that are a hash of the "true primary key". 我有一个ID为“真正的主键”的哈希表。 Correct me if I'm wrong, but I think my inserts are very slow in this table because of the clustered index on this key (it takes multiple minutes for inserting 100 000 rows). 如果我错了，请纠正我，但由于该键上的聚集索引（插入100000行需要花费几分钟的时间），因此我认为我在该表中的插入非常慢。 When I change the key to a nonclustered index, I have the impression that innoDB still secretly clusters on it. 当我将键更改为非聚集索引时，我觉得innoDB仍会秘密地在其上聚集。

Is there a simple way to avoid that mysql clusters on my primary key without having to define an auto increment primary key? 有没有一种简单的方法可以避免mysql群集在我的主键上而不必定义自动递增主键？

1 个解决方案

InnoDB must have a PRIMARY KEY . InnoDB 必须具有一个PRIMARY KEY 。

Innodb's first preference is an explicit PRIMARY KEY , whether AUTO_INCREMENT or not. Innodb的首选是显式PRIMARY KEY ，无论是否为AUTO_INCREMENT 。
Then a UNIQUE key, but only if none of the columns are NULLable . 然后是UNIQUE键，但NULLable是所有列均NULLable为NULLable 。
Finally, InnoDB will create a hidden, 6-byte, integer that acts somewhat like an auto_increment. 最后，InnoDB将创建一个隐藏的6字节整数，其作用类似于auto_increment。

Scenario 1. Inserting into a table must find the block where the desired primary key is. 方案1.插入表中必须找到所需主键所在的块。 For AUTO_INCREMENT and for #3, above, that will be the "last" block in the table. 对于AUTO_INCREMENT和上面的＃3，这将是表中的“最后一个”块。 The 100K rows will go into about 1000 blocks at the "end" of the table. 100K行将在表的“末尾”进入约1000个块。

Scenario 2. Otherwise (non-AI, but explicit PK; or UNIQUE), a block needs to be found (possibly read from disk), the key checked for dup, then the block updated and marked for later rewriting to disk. 方案2。否则（非AI，但显式PK；或UNIQUE），需要找到一个块（可能从磁盘读取），检查密钥是否为dup，然后更新该块并标记为以后再写入磁盘。

If all the blocks fit in the buffer_pool, then either of those is essentially the same speed. 如果所有块都适合buffer_pool，则这些块中的任何一个实质上都具有相同的速度。 But if the table is too big to be cached, then Scenario 2 becomes slow -- in fact slower and slower as the table grows. 但是，如果表太大而无法缓存，则方案2会变慢-实际上随着表的增长而变慢。 This is because of I/O. 这是因为I / O。 GUIDs, UUIDs, MD5s, and other hashes are notorious at suffering from this slow-down. GUID，UUID，MD5和其他哈希值因这种速度下降而臭名昭著。

Another issue: Transaction integrity dictates that each transaction incur some other I/O. 另一个问题：事务完整性要求每个事务都引起其他一些I / O。 Is your 100K inserts 100K transactions? 您的10万插入了10万笔交易吗？ 1 transaction? 1笔交易？ Best is to batch them in groups of 100 to 1000 rows per transaction. 最好是将每个事务以100至1000行的组进行批处理。

I hope those principles let you figure out your situation. 我希望这些原则能使您弄清楚自己的处境。 If not, please provide CREATE TABLE for each of the options you are thinking about. 如果没有，请为您考虑的每个选项提供CREATE TABLE 。 Then we can discuss your details. 然后，我们可以讨论您的详细信息。 Also provide SHOW VARIABLES LIKE 'innodb_buffer_pool_size'; 还提供SHOW VARIABLES LIKE 'innodb_buffer_pool_size'; and how much RAM you have. 以及您有多少RAM。