简体   繁体   English

在 MySQL 中应该避免 MEDIUMINT 吗?

[英]Should MEDIUMINT be avoided in MySQL?

I came across a comment on the following blogpost that recommends against using MEDIUMINT :我在以下博客文章中看到了一条建议不要使用MEDIUMINT

Don't use [the 24bit INT], even in MySQL.即使在 MySQL 中,也不要使用 [24 位 INT]。 It's dumb, and it's slow, and the code that implements it is a crawling horror.它很笨,而且很慢,实现它的代码是爬行的恐怖。

4294967295 and MySQL INT(20) Syntax Blows 4294967295 和 MySQL INT(20) 语法打击

An answer on Stack Overflow also notes that SQL Server, PostgreSQL and DB2 don't support MEDIUMINT : Stack Overflow 上的回答还指出 SQL Server、PostgreSQL 和 DB2 不支持MEDIUMINT

What is the difference between tinyint, smallint, mediumint, bigint and int in MySQL? MySQL 中的 tinyint、smallint、mediumint、bigint 和 int 有什么区别?


Should MEDIUMINT be avoided or should I continue to use it in the cases where it best represents the data I am storing?应该避免MEDIUMINT还是应该在它最能代表我存储的数据的情况下继续使用它?

InnoDB stores MEDIUMINT as three bytes value. InnoDB 将 MEDIUMINT 存储为三个字节的值。 But when MySQL has to do any computation the three bytes MEDIUMINT is converted into eight bytes unsigned long int(I assume nobody runs MySQL on 32 bits nowadays).但是当 MySQL 必须进行任何计算时,三个字节的 MEDIUMINT 被转换为八个字节的 unsigned long int(我假设现在没有人在 32 位上运行 MySQL)。

There are pros and cons, but you understand that "It's dumb, and it's slow, and the code that implements it is a crawling horror" reasoning is not technical, right?有利有弊,但你明白“它很笨,而且很慢,实现它的代码是爬行的恐怖”推理不是技术性的,对吧?

I would say MEDIUMINT makes sense when data size on disk is critical.当磁盘上的数据大小至关重要时,我会说 MEDIUMINT 是有意义的。 Ie when a table has so many records that even one byte difference (4 bytes INT vs 3 bytes MEDIUMINT) means a lot.即当一个表有如此多的记录时,即使是一个字节的差异(4 个字节的 INT 对 3 个字节的 MEDIUMINT)也意味着很多。 It's rather a rare case, but possible.这种情况比较少见,但也有可能。

mach_read_from_3 and mach_read_from_4 - primitives that InnoDB uses to read numbers from InnoDB records are similar. mach_read_from_3 和 mach_read_from_4 - InnoDB 用来从 InnoDB 记录读取数字的原语是相似的。 They both return ulint.他们都返回 ulint。 I bet you won't notice a difference on any workload.我敢打赌,您不会注意到任何工作量的差异。

Just take a look at the code:看看代码:

ulint
mach_read_from_3(
/*=============*/
        const byte*     b)      /*!< in: pointer to 3 bytes */
{
        ut_ad(b);
        return( ((ulint)(b[0]) << 16)
                | ((ulint)(b[1]) << 8)
                | (ulint)(b[2])
                );
}

Do you think it's much slower than this?你认为它比这慢得多吗?

ulint
mach_read_from_4(
/*=============*/
        const byte*     b)      /*!< in: pointer to four bytes */
{
        ut_ad(b);
        return( ((ulint)(b[0]) << 24)
                | ((ulint)(b[1]) << 16)
                | ((ulint)(b[2]) << 8)
                | (ulint)(b[3])
                );
}

In the grand scheme of things, fetching a row is the big cost.在宏伟的计划中,获取一行是很大的成本。 Simple functions, expressions, and much less, data formats, is insignificant in how long a query takes.简单的函数、表达式以及更不用说的数据格式,对于查询所需的时间来说无关紧要。

On the other side, if your dataset it too large to stay cached, the overhead of I/O to fetch row(s) is even more significant.另一方面,如果您的数据集太大而无法保持缓存,那么获取行的 I/O 开销就更加显着。 A crude rule of thumb says that a non-cached row takes 10 times as long as a cached one.粗略的经验法则是,非缓存行的时间是缓存行的 10 Hence, shrinking the dataset (such as using a smaller *INT ) may give you a huge performance benefit.因此,缩小数据集(例如使用较小的*INT可能会给您带来巨大的性能优势。

This argument apples to ...INT , FLOAT vs DOUBLE , DECIMAL(m,n) , DATETIME(n) , etc. (A different discussion is needed for [VAR]CHAR/BINARY(...) and TEXT/BLOB .)这个论点适用于...INTFLOAT vs DOUBLEDECIMAL(m,n)DATETIME(n)等( [VAR]CHAR/BINARY(...)TEXT/BLOB需要不同的讨论。 )

For those with a background in Assembly language...对于那些有汇编语言背景的人......

  • A table is likely to have a mixture of numbers and strings, thereby thwarting attempts to "align" values.表格很可能混合了数字和字符串,从而阻碍了“对齐”值的尝试。
  • MySQL has always handled a variety of hardwares (big/little-endian, 16/32/64-bit) with binary compatibility . MySQL 一直处理各种具有二进制兼容性的硬件(大/小端,16/32/64 位)。 Note how the code @akuzminsky provided avoids alignment and endian issues.请注意@akuzminsky 提供的代码如何避免对齐和字节序问题。 And it lets the compiler deal with 32-bit issues if the hardware is only 16-bit.如果硬件只有 16 位,它可以让编译器处理 32 位问题。
  • The code to test for special cases would probably outweigh the simply writing generic code.测试特殊情况的代码可能比简单地编写通用代码更重要。
  • We are talking typically less than 1% of the total row-handling time.我们所说的通常不到总行处理时间的 1%。

Hence, the only sane way to write the code is to work at the byte level, and to ignore register size and assume all values are mis-aligned.因此,编写代码的唯一明智方法是在字节级别工作,并忽略寄存器大小并假设所有值都未对齐。

For Optimization, in order of importance:对于优化,按重要性排序:

  1. Count the disk hits.计算磁盘命中数。 Touching disk is overwhelmingly the most costly part of a query.接触磁盘是查询中成本最高的部分。
  2. Count the number of rows touched.计算接触的行数。 Finding a row (via BTree, etc) takes some CPU.查找一行(通过 BTree 等)需要一些 CPU。 But, note, very few installations are CPU-bound;但是,请注意,很少有安装受 CPU 限制。 those that are tend to have poor indexes.那些往往索引很差的。 (Rule of Thumb: There are typically 100 rows in an InnoDB data or index block.) (经验法则:InnoDB 数据或索引块中通常有 100 行。)
  3. Only now does parsing the row come into play.只有现在解析行才起作用。

Rule of Thumb: If a tentative optimization does not (via back-of-envelope calc) yield 10% improvement, don't waste your time on it.经验法则:如果尝试性优化没有(通过包络后计算)产生 10% 的改进,请不要在上面浪费时间。 Instead look for some bigger improvement.而是寻找一些更大的改进。 For example, indexes and Summary tables are often provide 10x (not just 10%).例如,索引和汇总表通常提供 10x(不仅仅是 10%)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM