[英]Mysql: Optimizing Selecting rows from multiple ranges (using indexes?)
My table (projects): 我的桌子(项目):
id, lft, rgt
1, 1, 6
2, 2, 3
3, 4, 5
4, 7, 10
5, 8, 9
6, 11, 12
7, 13, 14
As you may have noticed, this is hierarchical data using the nested set model . 您可能已经注意到,这是使用嵌套集模型的分层数据。 Tree pretty-printed:
精美印刷的树:
1
2
3
4
5
6
7
I want to select all sub projects under project 1 and 4. I can do this with: 我想选择项目1和4下的所有子项目。
SELECT p.id
FROM projects AS p, projects AS ps
WHERE (ps.id = 1 OR ps.id = 4)
AND p.lft BETWEEN ps.lft AND ps.rgt
However, this is very slow with a large table, when running EXPLAIN (Query) i get: 但是,在运行EXPLAIN(Query)时,对于大表来说这非常慢,我得到:
+----+-------------+-------+-------+------------------------+---------+---------+------+------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+------------------------+---------+---------+------+------+-------------------------------------------------+
| 1 | SIMPLE | ps | range | PRIMARY,lft,rgt,lftRgt | PRIMARY | 4 | NULL | 2 | Using where |
| 1 | SIMPLE | p | ALL | lft,lftRgt | NULL | NULL | NULL | 7040 | Range checked for each record (index map: 0x12) |
+----+-------------+-------+-------+------------------------+---------+---------+------+------+-------------------------------------------------+
(The project table has indexes on lft, rgt, and lft-rgt. As you can see, mysql does not use any index, and loops through the 7040 records) (项目表在lft,rgt和lft-rgt上都有索引。如您所见,mysql不使用任何索引,而是循环访问7040条记录)
I have found that if I only select for one of the super project, mysql manages to use the indexes: 我发现,如果仅选择超级项目之一,则mysql设法使用索引:
SELECT p.id
FROM projects AS p, projects AS ps
WHERE ps.id = 1
AND p.lft BETWEEN ps.lft AND ps.rgt
EXPLAINs to: 解释为:
+----+-------------+-------+-------+------------------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+------------------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | ps | const | PRIMARY,lft,rgt,lftRgt | PRIMARY | 4 | const | 1 | |
| 1 | SIMPLE | p | range | lft,lftRgt | lft | 4 | NULL | 7 | Using where |
+----+-------------+-------+-------+------------------------+---------+---------+-------+------+-------------+
FINALLY , my question: I there any way i can SELECT rows matching multiple ranges, and still benefit from indexes? 最终 ,我的问题:我有什么办法可以选择匹配多个范围的行,并且仍然受益于索引?
have you tried a union? 你尝试过工会吗? take your second example, add "union" underneath and the repeat but matching id 4. i don't know if it would work, but it seems like an obvious thing to try.
以您的第二个示例为例,在“ union”下面添加重复并匹配ID4。我不知道它是否有效,但是尝试似乎很显然。
edit: 编辑:
SELECT p.id
FROM projects AS p, projects AS ps
WHERE ps.id = 1
AND p.lft BETWEEN ps.lft AND ps.rgt
UNION
SELECT p.id
FROM projects AS p, projects AS ps
WHERE ps.id = 4
AND p.lft BETWEEN ps.lft AND ps.rgt
From 7.2.5.1. 从7.2.5.1。 The Range Access Method for Single-Part Indexes in MySQL reference manual:
MySQL参考手册中的单部分索引的范围访问方法 :
Currently, MySQL does not support merging multiple ranges for the range access method for spatial indexes.
当前,MySQL不支持为空间索引的范围访问方法合并多个范围。 To work around this limitation, you can use a UNION with identical SELECT statements, except that you put each spatial predicate in a different SELECT.
要解决此限制,可以将UNION与相同的SELECT语句一起使用,只是将每个空间谓词放在不同的SELECT中。
So you need to have a union of two different selects. 因此,您需要具有两个不同选择的并集。
Your query does merge the multiple ranges. 您的查询确实合并了多个范围。
It uses a range
access method to combine the multiple ranges on p
(which is leading in the join). 它使用
range
访问方法来组合p
上的多个范围(这在连接中处于领先地位)。
For each row returned from p
, it checks the best method to retrieve all rows from ps
for the given values of p.lft
and p.rgt
. 对于从
p
返回的每一行,它将检查最佳方法以从ps
检索给定 p.lft
和p.rgt
值的所有行。 Depending on the query selectivity, it may be either a fullscan over ps
or a index lookup over one of two possible indexes. 根据查询的选择性,它可以是对
ps
的全扫描或对两个可能索引之一的索引查找。
The number of rows shown in the EXPLAIN
means nothing: the EXPLAIN
just shows the worst possible outcome. EXPLAIN
显示的行数没有任何意义: EXPLAIN
仅显示可能的最坏结果。 It doesn't necessarily mean that all these rows will be examined. 不一定意味着将检查所有这些行。 Whether they will or not the optimizer can only tell in runtime.
优化器是否只能在运行时知道。
The documentation snippet about the impossibility to merge the multiple ranges is only valid for SPATIAL
indexes ( R-Tree
those that you create over GEOMETRY
types). 关于不可能合并多个范围的文档摘要仅对
SPATIAL
索引有效( R-Tree
是您在GEOMETRY
类型上创建的那些索引)。 These indexes are good for the queries that search upwards (the ancestors of a given project) but not downwards. 这些索引适合向上搜索(给定项目的祖先)但不向下搜索的查询。
A plain B-Tree
index can combine the multiple ranges. 普通的
B-Tree
索引可以组合多个范围。 From the documentation : 从文档中 :
For all types of indexes, multiple range conditions combined with
OR
orAND
form a range condition.对于所有类型的索引,与
OR
或AND
组合的多个范围条件形成一个范围条件。
The real problem is that the optimizer in MySQL
cannot make a single correct decision: either use a single fullscan (with ps
leading), or make several range scans. 真正的问题是
MySQL
中的优化器无法做出一个正确的决定:使用单个全扫描(以ps
开头)或进行多个范围扫描。
Say, you have 10,000
rows and your projects boundaries are 0-500
and 2000-2500
. 假设您有
10,000
行,项目边界为0-500
和2000-2500
。 The optimizer will see that each boundary will benefit from the index, the range check
will result in two range accesses, while a single fullscan would be better. 优化器将看到每个边界将从索引中受益,
range check
将导致两次范围访问,而单个全扫描会更好。
It may be even worse if your project boundaries are, say, 0-3000
and 5000-6000
. 如果您的项目边界是
0-3000
和5000-6000
,那就更糟了。 In this case the optimizer will make two fullscans, while one would suffice. 在这种情况下,优化程序将进行两次全扫描,而一次扫描就足够了。
To help the optimizer make the correct decision, you should make the covering index on (lft, id)
in this order: 为了帮助优化器做出正确的决定,您应该按以下顺序在
(lft, id)
上进行覆盖索引:
CREATE INDEX ix_lft_id ON projects (lft, id)
The tipping point for using the fullscan
over a covering index rather than a range condition is 90%
, that means you will never have more than a one fullscan in your actual plan. 在覆盖范围索引而不是范围条件上使用全
fullscan
的临界点是90%
,这意味着您在实际计划中绝不会超过一个全扫描。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.