简体   繁体   English

为什么 MySQL 并不总是使用索引进行 select 查询?

[英]Why does MySQL not always use index for select query?

I have two tables in my database users and articles.我的数据库用户和文章中有两个表。

Records in my users and articles table are given below:我的用户和文章表中的记录如下:

+----+--------+
| id | name   |
+----+--------+
|  1 | user1  |
|  2 | user2  |
|  3 | user3  |
+----+--------+


+----+---------+----------+
| id | user_id | article  |
+----+---------+----------+
|  1 |       1 | article1 |
|  2 |       1 | article2 |
|  3 |       1 | article3 |
|  4 |       2 | article4 |
|  5 |       2 | article5 |
|  6 |       3 | article6 |
+----+---------+----------+

Given below the queries and the respected EXPLAIN output.下面给出了查询和受人尊敬的EXPLAIN output。

EXPLAIN SELECT * FROM articles WHERE user_id = 1;

+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | articles | NULL       | ALL  | user_id       | NULL | NULL    | NULL |    6 |    50.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+



EXPLAIN SELECT * FROM articles WHERE user_id = 2;
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table    | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | articles | NULL       | ref  | user_id       | user_id | 5       | const |    2 |   100.00 | NULL  |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+


EXPLAIN SELECT * FROM articles WHERE user_id = 3;
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table    | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | articles | NULL       | ref  | user_id       | user_id | 5       | const |    1 |   100.00 | NULL  |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+

Looking at the EXPLAIN plans for my select queries, it seems that queries are not always using the indexes.查看我的 select 查询的EXPLAIN计划,似乎查询并不总是使用索引。

In case,如果,

when user_id is 1, it doesn't use the key and scans the complete table.user_id为 1 时,它不使用密钥并扫描完整的表。

otherwise, it uses the user_id key and scans only few rows.否则,它使用user_id键并仅扫描几行。

Could you please explain why queries don't always use the index here?您能否解释一下为什么查询并不总是在这里使用索引?

There are (probably) two BTrees involved in the queries you show.您显示的查询中涉及(可能)两个 BTree。 One BTree for the data, sorted by the PRIMARY KEY , which I assume is id .一个用于数据的 BTree,按PRIMARY KEY排序,我假设它是id The other for the INDEX on user_id (again, I am guessing).另一个用于user_id上的INDEX (我再次猜测)。 When InnoDB (which I assume you are using) builds a "secondary index", such as INDEX(user_id) , it silently tacks on the PK of the table.当 InnoDB(我假设您正在使用)构建“二级索引”时,例如INDEX(user_id) ,它会默默地跟踪表的 PK。 So, effectively it becomes a BTree containing two columns: (user_id, id) and sorted by that pair.因此,它实际上变成了一个包含两列的BTree(user_id, id)并按该对排序。

When the Optimizer looks at SELECT * FROM t WHERE user_id=?当优化器查看SELECT * FROM t WHERE user_id=? , it probed the table and discovered that "a lot" of rows had user_id = 1 and not many rows had the other values you tried. ,它探测表并发现“很多”行的user_id = 1并且没有多少行具有您尝试的其他值。

The Optimizer has two (or more) ways to evaluate the queries like that --优化器有两种(或更多)方法来评估这样的查询——

Plan A (use the index): Here's what it does:计划 A(使用索引):这是它的作用:

  1. Drill down the Index's BTree to find the first occurrence of user_id=2 .向下钻取索引的 BTree 以找到user_id=2的第一次出现。
  2. There it will find an id .在那里它会找到一个id
  3. Use that id to drill down the data's BTree to find * (as in SELECT * ).使用该id向下钻取数据的 BTree 以查找* (如SELECT * )。
  4. Move on to the next entry in the Index BTree.移动到索引 BTree 中的下一个条目。 (This is actually rather efficient since it is really a "B+Tree"; see Wikipedia.) (这实际上是相当有效的,因为它实际上是一个“B+树”;参见 Wikipedia。)
  5. If found, loop back to step 2. If not found (no more index entries with user_id=2 ), exit.如果找到,则循环回到步骤 2。如果未找到(没有更多带有user_id=2的索引条目),则退出。

Plan B (don't use the index -- useful for your user_id=1 ):计划 B(不要使用索引——对您的user_id=1有用):

  1. Simply walk through the data BTree in whatever order.只需以任何顺序遍历数据 BTree。
  2. Skip any row that does not have user_id=1 .跳过任何没有user_id=1的行。

The bouncing back and forth between the two BTrees costs something.在两个 BTree 之间来回弹跳是有代价的。 The Optimizer decided your =1 case would need to look at more than about 20% of the table and decided that plan B would be faster.优化器决定您的=1案例需要查看超过 20% 的表格,并决定计划 B 会更快。 That is, it deliberately ignored the INDEX.也就是说,它故意忽略了 INDEX。

There are a lot of factors that the Optimizer can't or doesn't estimate correctly, but generally picking between these two Plans leads to faster execution.优化器无法或无法正确估计许多因素,但通常在这两个计划之间进行选择会导致更快的执行。 (Your table is too small to reliably measure a difference.) (您的表格太小,无法可靠地测量差异。)

Other "Plans" -- If the index is "covering", there is no need to use the data BTree.其他“计划”——如果索引是“覆盖”,则无需使用数据 BTree。 If there is an ORDER BY that can be used, then the Optimizer will probably use Plan A to avoid the "filesort".如果有可以使用的ORDER BY ,那么优化器可能会使用计划 A 来避免“文件排序”。 (See EXPLAIN SELECT... ) Etc. (见EXPLAIN SELECT... )等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM