简体   繁体   English

当表增长时,InnoDB行大小呈指数级变化?

[英]InnoDB row size changing exponentially while table is growing?

I have a huge InnoDB Table with three columns (int, mediumint, int). 我有一个巨大的InnoDB表,有三列(int,mediumint,int)。 The innodb_file_per_table setting is on and there is only a PRIMARY KEY of the first two columns innodb_file_per_table设置已打开,前两列只有一个PRIMARY KEY

The table schema is: 表模式是:

CREATE TABLE `big_table` (
  `user_id` int(10) unsigned NOT NULL,
  `another_id` mediumint(8) unsigned NOT NULL,
  `timestamp` int(10) unsigned NOT NULL,
  PRIMARY KEY (`user_id`,`another_id `)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

MySQL Version is 5.6.16 MySQL版本是5.6.16

Currently I am multi-inserting over 150 rows per second. 目前我每秒多插入超过150行。 No deletion, and no updates. 没有删除,也没有更新。 There are no significant rollbacks or other transaction aborts, that would cause wasted space usage. 没有重大的回滚或其他事务中止,这将导致浪费的空间使用。

MySQL shows a calculated size of 75,7GB on that table. MySQL在该表上显示计算出的大小为75,7GB。

.ibd size on disc: 136,679,784,448 byte (127.29 GiB) 光盘上的.ibd大小:136,679,784,448字节(127.29 GiB)

Counted rows: 2,901,937,966 (47.10 byte per row) 计数行:2,901,937,966(每行47.10字节)

2 days later MySQL shows also a calculated size of 75.7 GB on that table. 2天后,MySQL在该表上显示的计算大小为75.7 GB。

.ibd size on disc: 144,263,086,080 byte (135.35 GiB) 光盘上的.ibd大小:144,263,086,080字节(135.35 GiB)

Counted rows: 2,921,284,863 (49.38 byte per row) 计数行:2,921,284,863(每行49.38字节)

Running SHOW TABLE STATUS for the table shows: 运行SHOW TABLE STATUS显示:

Engine | Version | Row_format | Rows       | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Collation 
InnoDB |      10 | Compact    | 2645215723 |             30 | 81287708672 |               0 |            0 |   6291456 | utf8_unicode_ci

Here are my Questions: 这是我的问题:

  • Why is the disc usage growing disproportionally to the row count? 为什么光盘使用量与行数不成比例增长?
  • Why is the Avg_row_length and Data_length totally wrong? 为什么Avg_row_lengthData_length 完全错误?

Hope someone can help me, that the disc usage will not grow like this anymore. 希望有人可以帮助我,光盘使用不再像这样增长。 I have not noticed that as the table was smaller. 我没有注意到,因为桌子较小。

I am assuming that your table hasn't grown to its present ~2.9 billion rows organically, and that you either recently loaded this data or have caused the table to be re-organized (using ALTER TABLE or OPTIMIZE TABLE , for instance). 我假设你的表没有有机地增长到现在的~29亿行,并且你最近加载了这些数据或导致表被重组(例如使用ALTER TABLEOPTIMIZE TABLE )。 So it starts off quite well-packed on disk. 所以它开始在磁盘上非常好。

Based on your table schema (which is fortunately very simple and straightforward), each row (record) is laid out as follows: 根据您的表模式(幸运的是非常简单和直接),每行(记录)的布局如下:

(Header)              5 bytes
`user_id`             4 bytes
`another_id`          3 bytes
(Transaction ID)      6 bytes
(Rollback Pointer)    7 bytes
`timestamp`           4 bytes
=============================
Total                29 bytes

InnoDB will never actually fill pages to more than approximately ~15/16 full (and normally never less than 1/2 full). InnoDB永远不会实际填充页面超过约15/16满(通常不会少于1/2满)。 With all of the extra overhead in various places the full-loaded cost of a record is somewhere around 32 bytes minimum and 60 bytes maximum per row in leaf pages of the index. 由于各种位置的所有额外开销,记录的完整加载成本在索引的叶页中最小约32字节和每行最多60字节。

When you bulk-load data through an import or through an ALTER TABLE or OPTIMIZE TABLE , the data will normally be loaded (and the indexes created) in order by PRIMARY KEY , which allows InnoDB to very efficiently pack the data on disk. 当您通过导入或通过ALTER TABLEOPTIMIZE TABLE批量加载数据时,通常会按PRIMARY KEY顺序加载数据(并创建索引),这样InnoDB可以非常有效地将数据打包到磁盘上。 If you then continue writing data to the table in random (or effectively random) order, the efficiently-packed index structures must expand to accept the new data, which in B+Tree terms means splitting pages in half. 如果随后继续以随机(或有效随机)顺序将数据写入表中,则必须扩展高效打包的索引结构以接受新数据,在B + Tree术语中,这意味着将页面拆分为一半。 If you have an ideally-packed 16 KiB page where records consume ~32 bytes on average, and it is split in half to insert a single row, you now have two half-empty pages (~16 KiB wasted) and that new row has "cost" 16 KiB. 如果你有一个理想的16 KiB页面,其中记录平均消耗约32个字节,并且它被分成两半以插入一行,你现在有两个半空页面(约16 KiB浪费)并且新行有“成本”16 KiB。

Of course that's not really true. 当然,这不是真的。 Over time the index tree would settle down with pages somewhere between 1/2 full and 15/16 full -- it won't keep splitting pages forever, because the next insert that must happen into the same page will find that plenty of space already exists to do the insert. 随着时间的推移,索引树会在页面介于1/2满和15/16之间的情况下稳定下来 - 它不会永久地分割页面,因为必须发生在同一页面中的下一个插入将发现已经有足够的空间存在进行插入。

This can be a bit disconcerting if you initially bulk load (and thus efficiently pack) your data into a table and then switch to organically growing it, though. 如果您最初将数据批量加载(并因此有效地打包)到表中然后切换到有机增长它,这可能有点令人不安。 Initially it will seem as though the tables are growing at an insane pace, but if you track the growth rate over time it should slow down. 最初似乎桌子正以疯狂的速度增长,但如果你跟踪增长率,它应该放慢速度。

You can read more about InnoDB index and record layout in my blog posts: The physical structure of records in InnoDB , The physical structure of InnoDB index pages , and B+Tree index structures in InnoDB . 您可以在我的博客文章中阅读有关InnoDB索引和记录布局的更多信息:InnoDB 中记录 的物理结构,InnoDB索引页面的物理结构以及InnoDB中的 B + Tree索引结构

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM