简体   繁体   English

Mysql:优化从多个范围中选择行(使用索引?)

[英]Mysql: Optimizing Selecting rows from multiple ranges (using indexes?)

My table (projects): 我的桌子(项目):

id, lft, rgt
1, 1, 6
2, 2, 3
3, 4, 5
4, 7, 10
5, 8, 9
6, 11, 12
7, 13, 14

As you may have noticed, this is hierarchical data using the nested set model . 您可能已经注意到,这是使用嵌套集模型的分层数据。 Tree pretty-printed: 精美印刷的树:

1
 2
 3
4
 5
6
7

I want to select all sub projects under project 1 and 4. I can do this with: 我想选择项目1和4下的所有子项目。

SELECT p.id
FROM projects AS p, projects AS ps
WHERE (ps.id = 1 OR ps.id = 4)
AND p.lft BETWEEN ps.lft AND ps.rgt

However, this is very slow with a large table, when running EXPLAIN (Query) i get: 但是,在运行EXPLAIN(Query)时,对于大表来说这非常慢,我得到:

+----+-------------+-------+-------+------------------------+---------+---------+------+------+-------------------------------------------------+
| id | select_type | table | type  | possible_keys          | key     | key_len | ref  | rows | Extra                                           |
+----+-------------+-------+-------+------------------------+---------+---------+------+------+-------------------------------------------------+
|  1 | SIMPLE      | ps    | range | PRIMARY,lft,rgt,lftRgt | PRIMARY | 4       | NULL |    2 | Using where                                     | 
|  1 | SIMPLE      | p     | ALL   | lft,lftRgt             | NULL    | NULL    | NULL | 7040 | Range checked for each record (index map: 0x12) | 
+----+-------------+-------+-------+------------------------+---------+---------+------+------+-------------------------------------------------+

(The project table has indexes on lft, rgt, and lft-rgt. As you can see, mysql does not use any index, and loops through the 7040 records) (项目表在lft,rgt和lft-rgt上都有索引。如您所见,mysql不使用任何索引,而是循环访问7040条记录)

I have found that if I only select for one of the super project, mysql manages to use the indexes: 我发现,如果仅选择超级项目之一,则mysql设法使用索引:

SELECT p.id
FROM projects AS p, projects AS ps
WHERE ps.id = 1
AND p.lft BETWEEN ps.lft AND ps.rgt

EXPLAINs to: 解释为:

+----+-------------+-------+-------+------------------------+---------+---------+-------+------+-------------+
| id | select_type | table | type  | possible_keys          | key     | key_len | ref   | rows | Extra       |
+----+-------------+-------+-------+------------------------+---------+---------+-------+------+-------------+
|  1 | SIMPLE      | ps    | const | PRIMARY,lft,rgt,lftRgt | PRIMARY | 4       | const |    1 |             | 
|  1 | SIMPLE      | p     | range | lft,lftRgt             | lft     | 4       | NULL  |    7 | Using where | 
+----+-------------+-------+-------+------------------------+---------+---------+-------+------+-------------+

FINALLY , my question: I there any way i can SELECT rows matching multiple ranges, and still benefit from indexes? 最终 ,我的问题:我有什么办法可以选择匹配多个范围的行,并且仍然受益于索引?

have you tried a union? 你尝试过工会吗? take your second example, add "union" underneath and the repeat but matching id 4. i don't know if it would work, but it seems like an obvious thing to try. 以您的第二个示例为例,在“ union”下面添加重复并匹配ID4。我不知道它是否有效,但是尝试似乎很显然。

edit: 编辑:

SELECT p.id
FROM projects AS p, projects AS ps
WHERE ps.id = 1
AND p.lft BETWEEN ps.lft AND ps.rgt
UNION
SELECT p.id
FROM projects AS p, projects AS ps
WHERE ps.id = 4
AND p.lft BETWEEN ps.lft AND ps.rgt

From 7.2.5.1. 7.2.5.1。 The Range Access Method for Single-Part Indexes in MySQL reference manual: MySQL参考手册中的单部分索引的范围访问方法

Currently, MySQL does not support merging multiple ranges for the range access method for spatial indexes. 当前,MySQL不支持为空间索引的范围访问方法合并多个范围。 To work around this limitation, you can use a UNION with identical SELECT statements, except that you put each spatial predicate in a different SELECT. 要解决此限制,可以将UNION与相同的SELECT语句一起使用,只是将每个空间谓词放在不同的SELECT中。

So you need to have a union of two different selects. 因此,您需要具有两个不同选择的并集。

Your query does merge the multiple ranges. 您的查询确实合并了多个范围。

It uses a range access method to combine the multiple ranges on p (which is leading in the join). 它使用range访问方法来组合p上的多个范围(这在连接中处于领先地位)。

For each row returned from p , it checks the best method to retrieve all rows from ps for the given values of p.lft and p.rgt . 对于从p返回的每一行,它将检查最佳方法以从ps检索给定 p.lftp.rgt值的所有行。 Depending on the query selectivity, it may be either a fullscan over ps or a index lookup over one of two possible indexes. 根据查询的选择性,它可以是对ps的全扫描或对两个可能索引之一的索引查找。

The number of rows shown in the EXPLAIN means nothing: the EXPLAIN just shows the worst possible outcome. EXPLAIN显示的行数没有任何意义: EXPLAIN仅显示可能的最坏结果。 It doesn't necessarily mean that all these rows will be examined. 不一定意味着将检查所有这些行。 Whether they will or not the optimizer can only tell in runtime. 优化器是否只能在运行时知道。

The documentation snippet about the impossibility to merge the multiple ranges is only valid for SPATIAL indexes ( R-Tree those that you create over GEOMETRY types). 关于不可能合并多个范围的文档摘要仅对SPATIAL索引有效( R-Tree是您在GEOMETRY类型上创建的那些索引)。 These indexes are good for the queries that search upwards (the ancestors of a given project) but not downwards. 这些索引适合向上搜索(给定项目的祖先)但不向下搜索的查询。

A plain B-Tree index can combine the multiple ranges. 普通的B-Tree索引可以组合多个范围。 From the documentation : 文档中

For all types of indexes, multiple range conditions combined with OR or AND form a range condition. 对于所有类型的索引,与ORAND组合的多个范围条件形成一个范围条件。

The real problem is that the optimizer in MySQL cannot make a single correct decision: either use a single fullscan (with ps leading), or make several range scans. 真正的问题是MySQL中的优化器无法做出一个正确的决定:使用单个全扫描(以ps开头)或进行多个范围扫描。

Say, you have 10,000 rows and your projects boundaries are 0-500 and 2000-2500 . 假设您有10,000行,项目边界为0-5002000-2500 The optimizer will see that each boundary will benefit from the index, the range check will result in two range accesses, while a single fullscan would be better. 优化器将看到每个边界将从索引中受益, range check将导致两次范围访问,而单个全扫描会更好。

It may be even worse if your project boundaries are, say, 0-3000 and 5000-6000 . 如果您的项目边界是0-30005000-6000 ,那就更糟了。 In this case the optimizer will make two fullscans, while one would suffice. 在这种情况下,优化程序将进行两次全扫描,而一次扫描就足够了。

To help the optimizer make the correct decision, you should make the covering index on (lft, id) in this order: 为了帮助优化器做出正确的决定,您应该按以下顺序在(lft, id)上进行覆盖索引:

CREATE INDEX ix_lft_id ON projects (lft, id)

The tipping point for using the fullscan over a covering index rather than a range condition is 90% , that means you will never have more than a one fullscan in your actual plan. 在覆盖范围索引而不是范围条件上使用全fullscan的临界点是90% ,这意味着您在实际计划中绝不会超过一个全扫描。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM