[英]Indexed column and not indexed column research
I generated separate MySQL Innodb tables with 2000, 5000, 10000, 50000, 10000, 20000, 50000, 100 000, 200 000 elements(with help of php loop and insert query).我生成了单独的 MySQL Innodb 表,其中包含 2000、5000、10000、50000、10000、20000、50000、100 000、200 000 个元素(借助 php 循环和插入查询)。 Each table has two columns: id(Primary Key INT autoincrement), number(INT UNIQUE KEY).
每个表有两列:id(Primary Key INT autoincrement)、number(INT UNIQUE KEY)。 Then I did the same but this time I generated similar tables where number column doesn't have an INDEX .I generated tables in a such way: the value of column number is equal to value of index + 2: first element == 3, 1000th element is 1002 and so on.
然后我做了同样的但这次我生成了类似的表,其中number列没有 INDEX 。我以这样的方式生成表:列号的值等于索引值 + 2:第一个元素 == 3,第 1000 个元素是 1002,依此类推。 I wanted to test a query like that, because It will be used in my application:
我想测试这样的查询,因为它将在我的应用程序中使用:
SELECT count(number) FROM number_two_hundred_I WHERE number=200002;
After generating data for these tables I wanted to test time for the worst case queries.在为这些表生成数据后,我想测试最坏情况查询的时间。 I used SHOW PROFILES for it.
我使用了显示配置文件。 I made an assumption that the worst case query would correspond to the element with value of column number to 1002, 2002, and so on, so here are all the queries that I tested and the time(evaluated by SHOW PROFILES):
我假设最坏情况的查询将对应于列号值为 1002、2002 等的元素,所以这里是我测试的所有查询和时间(由 SHOW PROFILES 评估):
SELECT count(number) FROM number_two_thousand_I WHERE number=2002;
// for tables with indexed column number I used **suffix _I** in the end
// of name of the table. Here is the time for it 0.00099250
SELECT count(number) FROM number_two_thousand WHERE number=2002;
// column number is not indexed when there is no **suffix _I**
// time for this one is 0.00226275
SELECT count(number) FROM number_five_thousand_I WHERE number=5002;
// 0.00095600
SELECT count(number) FROM number_five_thousand WHERE number=5002;
// 0.00404125
So here are the results:结果如下:
2000 el - indexed 0.00099250 not indexed - 0.00226275 2000 el - 索引 0.00099250 未索引 - 0.00226275
5000 el - indexed 0.00095600 not indexed - 0.00404125 5000 el - 索引 0.00095600 未索引 - 0.00404125
10000 el - indexed 0.00156900 not indexed - 0.00761750 10000 el - 索引 0.00156900 未索引 - 0.00761750
Here is infographic for that.这是信息图。 It shows how number of elements depends on the worst case time of query for indexed/not indexed column.
它显示了元素数量如何取决于索引/未索引列的最坏情况查询时间。 Indexed is red color.
索引是红色。 When I tested speed, I typed the same query in mysql console 2 times , because I figured out that when you make query for the 1st time, sometimes query for not indexed column can be even a bit faster, than for indexed one.
当我测试速度时,我在 mysql 控制台中输入了 2 次相同的查询,因为我发现当您第一次进行查询时,有时查询未索引列甚至比索引列还要快一点。 Question is: why this type of query for 200000 elements takes sometimes less time, than the same query for 100000 elements when column number is indexed.
问题是:为什么这种对 200000 个元素的查询有时比对列号进行索引时对 100000 个元素的相同查询花费的时间更少。 You can see that there are other unpredictable for me results.
你可以看到还有其他对我来说不可预测的结果。 I ask this, because when column number is not indexed, the results are quite predictable: 200000 el time is always bigger than 100000. Please tell me what I'm doing wrong when trying to make research about UNIQUE indexed column.
我问这个,因为当列号没有被索引时,结果是可以预测的:200000 el 时间总是大于 100000。请告诉我在尝试对 UNIQUE 索引列进行研究时我做错了什么。
在未索引的情况下,它始终是全表扫描,因此时间与行号很好地吻合,如果它被索引,您正在测量索引查找时间,这在您的情况下是恒定的(小数字,小偏差)
It is not the "worst" case.这还不是“最坏”的情况。
UNIQUE
key random instead of being in lock step with the PK.UNIQUE
密钥随机而不是与 PK 处于锁定步骤。 An example of such is UUID()
.UUID()
。 If you both of those you will eventually see the performance slow down significantly.如果两者兼而有之,您最终会看到性能显着下降。
UNIQUE
keys have the following impact on INSERTs
: The uniqueness constraint is checked before returning to the client. UNIQUE
键对INSERTs
有以下影响:在返回给客户端之前检查唯一性约束。 For a non-UNIQUE index, the work to insert into the index's BTree can (and is) delayed.对于非 UNIQUE 索引,插入索引的 BTree 的工作可以(并且已经)延迟。 (cf "Change buffer). With no index on the second column, there is even less work to do.
(参见“更改缓冲区”)。由于第二列上没有索引,因此要做的工作更少。
WHERE number=2002
-- WHERE number=2002
--
UNIQUE(number)
-- Drill down the BTree.UNIQUE(number)
-- 深入 BTree。 Very fast, very efficient.INDEX(number)
-- Drill down the BTree.INDEX(number)
-- 深入 BTree。 Very fast, very efficient.number
-- Scan the entire table.number
索引——扫描整个表。 So the cost depends on table size, not the value of number
.number
的值。 It has no clue if 2002 exists anywhere in the table, or how many times. I suggest you use log-log 'paper' for your graph.我建议您在图表中使用 log-log 'paper'。 Anyway, note how linear the non-indexed case is.
无论如何,请注意非索引情况的线性程度。 And the indexed case is essentially constant.
并且索引的情况基本上是恒定的。 Finding number=200002 is just as cheap as finding number=2002.
查找 number=200002 与查找 number=2002 一样便宜。 This applies for
UNIQUE
and INDEX
.这适用于
UNIQUE
和INDEX
。 (Actually, there is a very slight rise in the line because a BTree is really O(log n), not O(1). For 2K rows, there are probably 2 levels in the BTree; for 200K, 3 levels.) (实际上,由于 BTree 确实是 O(log n),而不是 O(1),因此行中有非常小的上升。对于 2K 行,BTree 中可能有 2 个级别;对于 200K,则为 3 个级别。)
The Query cache can trip you up in timings (if it is turned on).查询缓存可以在时间上绊倒您(如果它已打开)。 When timing, do
SELECT SQL_NO_CACHE ...
to avoid the QC.计时时,请执行
SELECT SQL_NO_CACHE ...
以避免 QC。 If the QC is on and applies, then the second and subsequent runs of the identical query will take very close to 0.000 seconds.如果 QC 开启并应用,那么相同查询的第二次和后续运行将花费非常接近 0.000 秒。
Those timings that varied between 0.5ms and 1.2ms -- chalk it up to the phase of the moon.那些在 0.5 毫秒和 1.2 毫秒之间变化的时间 - 将其归结为月相。 Seriously, any timing below 10ms should not be trusted.
说真的,任何低于 10 毫秒的时间都不应该被信任。 This is because of all the other things that may be happening on the computer at the same time.
这是因为计算机上可能同时发生的所有其他事情。 You can temper it somewhat by averaging multiple runs -- being sure to avoid (1) the Query cache, and (2) I/O.
您可以通过平均多次运行来稍微调整它——确保避免 (1) 查询缓存和 (2) I/O。
As for I/O... This gets back to my earlier comment about what may happen when the table (and/or index) is bigger than can be cached in RAM.至于 I/O...这又回到了我之前关于当表(和/或索引)大于 RAM 中缓存时可能发生的情况的评论。
Your tags are, technically, incorrect.从技术上讲,您的标签不正确。 Most of MySQL's indexes are BTrees (actually B+Trees), not Binary Trees.
MySQL的索引大多是BTrees(实际上是B+Trees),而不是二叉树。 (Sure, there is a lot of similarity, and many of the principles are shared.)
(当然,有很多相似之处,许多原则是共享的。)
Back to your research goal.回到你的研究目标。
The main cost in performing any SELECT
is how many rows it touches.执行任何
SELECT
的主要成本是它接触了多少行。
UNIQUE
index, it is touching 1 row.UNIQUE
索引,它触及 1 行。 So expect fast and O(1) (plus noise).
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.