简体   繁体   English

在innoDB中强制隐藏的聚集索引

[英]Force hidden clustered index in innoDB

I have a table with IDs that are a hash of the "true primary key". 我有一个ID为“真正的主键”的哈希表。 Correct me if I'm wrong, but I think my inserts are very slow in this table because of the clustered index on this key (it takes multiple minutes for inserting 100 000 rows). 如果我错了,请纠正我,但由于该键上的聚集索引(插入100000行需要花费几分钟的时间),因此我认为我在该表中的插入非常慢。 When I change the key to a nonclustered index, I have the impression that innoDB still secretly clusters on it. 当我将键更改为非聚集索引时,我觉得innoDB仍会秘密地在其上聚集。

Is there a simple way to avoid that mysql clusters on my primary key without having to define an auto increment primary key? 有没有一种简单的方法可以避免mysql群集在我的主键上而不必定义自动递增主键?

InnoDB must have a PRIMARY KEY . InnoDB 必须具有一个PRIMARY KEY

  1. Innodb's first preference is an explicit PRIMARY KEY , whether AUTO_INCREMENT or not. Innodb的首选是显式PRIMARY KEY ,无论是否为AUTO_INCREMENT
  2. Then a UNIQUE key, but only if none of the columns are NULLable . 然后是UNIQUE键,但NULLable是所有列均NULLableNULLable
  3. Finally, InnoDB will create a hidden, 6-byte, integer that acts somewhat like an auto_increment. 最后,InnoDB将创建一个隐藏的6字节整数,其作用类似于auto_increment。

Scenario 1. Inserting into a table must find the block where the desired primary key is. 方案1.插入表中必须找到所需主键所在的块。 For AUTO_INCREMENT and for #3, above, that will be the "last" block in the table. 对于AUTO_INCREMENT和上面的#3,这将是表中的“最后一个”块。 The 100K rows will go into about 1000 blocks at the "end" of the table. 100K行将在表的“末尾”进入约1000个块。

Scenario 2. Otherwise (non-AI, but explicit PK; or UNIQUE), a block needs to be found (possibly read from disk), the key checked for dup, then the block updated and marked for later rewriting to disk. 方案2。否则(非AI,但显式PK;或UNIQUE),需要找到一个块(可能从磁盘读取),检查密钥是否为dup,然后更新该块并标记为以后再写入磁盘。

If all the blocks fit in the buffer_pool, then either of those is essentially the same speed. 如果所有块都适合buffer_pool,则这些块中的任何一个实质上都具有相同的速度。 But if the table is too big to be cached, then Scenario 2 becomes slow -- in fact slower and slower as the table grows. 但是,如果表太大而无法缓存,则方案2会变慢-实际上随着表的增长而变慢。 This is because of I/O. 这是因为I / O。 GUIDs, UUIDs, MD5s, and other hashes are notorious at suffering from this slow-down. GUID,UUID,MD5和其他哈希值因这种速度下降而臭名昭著。

Another issue: Transaction integrity dictates that each transaction incur some other I/O. 另一个问题:事务完整性要求每个事务都引起其他一些I / O。 Is your 100K inserts 100K transactions? 您的10万插入了10万笔交易吗? 1 transaction? 1笔交易? Best is to batch them in groups of 100 to 1000 rows per transaction. 最好是将每个事务以100至1000行的组进行批处理。

I hope those principles let you figure out your situation. 我希望这些原则能使您弄清楚自己的处境。 If not, please provide CREATE TABLE for each of the options you are thinking about. 如果没有,请为您考虑的每个选项提供CREATE TABLE Then we can discuss your details. 然后,我们可以讨论您的详细信息。 Also provide SHOW VARIABLES LIKE 'innodb_buffer_pool_size'; 还提供SHOW VARIABLES LIKE 'innodb_buffer_pool_size'; and how much RAM you have. 以及您有多少RAM。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM