[英]Why does MySQL not always use index for select query?
I have two tables in my database users and articles.我的数据库用户和文章中有两个表。
Records in my users and articles table are given below:我的用户和文章表中的记录如下:
+----+--------+
| id | name |
+----+--------+
| 1 | user1 |
| 2 | user2 |
| 3 | user3 |
+----+--------+
+----+---------+----------+
| id | user_id | article |
+----+---------+----------+
| 1 | 1 | article1 |
| 2 | 1 | article2 |
| 3 | 1 | article3 |
| 4 | 2 | article4 |
| 5 | 2 | article5 |
| 6 | 3 | article6 |
+----+---------+----------+
Given below the queries and the respected EXPLAIN
output.下面给出了查询和受人尊敬的
EXPLAIN
output。
EXPLAIN SELECT * FROM articles WHERE user_id = 1;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | articles | NULL | ALL | user_id | NULL | NULL | NULL | 6 | 50.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
EXPLAIN SELECT * FROM articles WHERE user_id = 2;
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | articles | NULL | ref | user_id | user_id | 5 | const | 2 | 100.00 | NULL |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
EXPLAIN SELECT * FROM articles WHERE user_id = 3;
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | articles | NULL | ref | user_id | user_id | 5 | const | 1 | 100.00 | NULL |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+----------+-------+
Looking at the EXPLAIN
plans for my select queries, it seems that queries are not always using the indexes.查看我的 select 查询的
EXPLAIN
计划,似乎查询并不总是使用索引。
In case,如果,
when user_id
is 1, it doesn't use the key and scans the complete table.当
user_id
为 1 时,它不使用密钥并扫描完整的表。
otherwise, it uses the user_id
key and scans only few rows.否则,它使用
user_id
键并仅扫描几行。
Could you please explain why queries don't always use the index here?您能否解释一下为什么查询并不总是在这里使用索引?
There are (probably) two BTrees involved in the queries you show.您显示的查询中涉及(可能)两个 BTree。 One BTree for the data, sorted by the
PRIMARY KEY
, which I assume is id
.一个用于数据的 BTree,按
PRIMARY KEY
排序,我假设它是id
。 The other for the INDEX
on user_id
(again, I am guessing).另一个用于
user_id
上的INDEX
(我再次猜测)。 When InnoDB (which I assume you are using) builds a "secondary index", such as INDEX(user_id)
, it silently tacks on the PK of the table.当 InnoDB(我假设您正在使用)构建“二级索引”时,例如
INDEX(user_id)
,它会默默地跟踪表的 PK。 So, effectively it becomes a BTree
containing two columns: (user_id, id)
and sorted by that pair.因此,它实际上变成了一个包含两列的
BTree
: (user_id, id)
并按该对排序。
When the Optimizer looks at SELECT * FROM t WHERE user_id=?
当优化器查看
SELECT * FROM t WHERE user_id=?
, it probed the table and discovered that "a lot" of rows had user_id = 1
and not many rows had the other values you tried. ,它探测表并发现“很多”行的
user_id = 1
并且没有多少行具有您尝试的其他值。
The Optimizer has two (or more) ways to evaluate the queries like that --优化器有两种(或更多)方法来评估这样的查询——
Plan A (use the index): Here's what it does:计划 A(使用索引):这是它的作用:
user_id=2
.user_id=2
的第一次出现。id
.id
。id
to drill down the data's BTree to find *
(as in SELECT *
).id
向下钻取数据的 BTree 以查找*
(如SELECT *
)。user_id=2
), exit.user_id=2
的索引条目),则退出。 Plan B (don't use the index -- useful for your user_id=1
):计划 B(不要使用索引——对您的
user_id=1
有用):
user_id=1
.user_id=1
的行。 The bouncing back and forth between the two BTrees costs something.在两个 BTree 之间来回弹跳是有代价的。 The Optimizer decided your
=1
case would need to look at more than about 20% of the table and decided that plan B would be faster.优化器决定您的
=1
案例需要查看超过 20% 的表格,并决定计划 B 会更快。 That is, it deliberately ignored the INDEX.也就是说,它故意忽略了 INDEX。
There are a lot of factors that the Optimizer can't or doesn't estimate correctly, but generally picking between these two Plans leads to faster execution.优化器无法或无法正确估计许多因素,但通常在这两个计划之间进行选择会导致更快的执行。 (Your table is too small to reliably measure a difference.)
(您的表格太小,无法可靠地测量差异。)
Other "Plans" -- If the index is "covering", there is no need to use the data BTree.其他“计划”——如果索引是“覆盖”,则无需使用数据 BTree。 If there is an
ORDER BY
that can be used, then the Optimizer will probably use Plan A to avoid the "filesort".如果有可以使用的
ORDER BY
,那么优化器可能会使用计划 A 来避免“文件排序”。 (See EXPLAIN SELECT...
) Etc. (见
EXPLAIN SELECT...
)等。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.