[英]MySQL: why is a query using a VIEW less efficient compared to a query directly using the view's underlying JOIN?
I have three tables, bug
, bugrule
and bugtrace
, for which relationships are:我有三个表,
bug
, bugrule
和bugtrace
,它们的关系是:
bug 1--------N bugrule
id = bugid
bugrule 0---------N bugtrace
id = ruleid
Because I'm almost always interested in relations between bug <---> bugtrace
I have created an appropriate VIEW
which is used as part of several queries.因为我几乎总是对
bug <---> bugtrace
之间的关系感兴趣, bug <---> bugtrace
我创建了一个适当的VIEW
,用作多个查询的一部分。 Interestingly, queries using this VIEW
have significantly worse performance than equivalent queries using the underlying JOIN
explicitly.有趣的是,使用此
VIEW
查询比显式使用底层JOIN
等效查询的性能要差得多。
VIEW
definition: VIEW
定义:
CREATE VIEW bugtracev AS
SELECT t.*, r.bugid
FROM bugtrace AS t
LEFT JOIN bugrule AS r ON t.ruleid=r.id
WHERE r.version IS NULL
Execution plan for a query using the VIEW
(bad performance):使用
VIEW
的查询的执行计划(性能不佳):
mysql> explain
SELECT c.id,state,
(SELECT COUNT(DISTINCT(t.id)) FROM bugtracev AS t
WHERE t.bugid=c.id)
FROM bug AS c
WHERE c.version IS NULL
AND c.id<10;
+----+--------------------+-------+-------+---------------+--------+---------+-----------------+---------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+--------+---------+-----------------+---------+-----------------------+
| 1 | PRIMARY | c | range | id_2,id | id_2 | 8 | NULL | 3 | Using index condition |
| 2 | DEPENDENT SUBQUERY | t | index | NULL | ruleid | 9 | NULL | 1426004 | Using index |
| 2 | DEPENDENT SUBQUERY | r | ref | id_2,id | id_2 | 8 | bugapp.t.ruleid | 1 | Using where |
+----+--------------------+-------+-------+---------------+--------+---------+-----------------+---------+-----------------------+
3 rows in set (0.00 sec)
Execution plan for a query using the underlying JOIN
directly (good performance):直接使用底层
JOIN
的查询执行计划(性能好):
mysql> explain
SELECT c.id,state,
(SELECT COUNT(DISTINCT(t.id))
FROM bugtrace AS t
LEFT JOIN bugrule AS r ON t.ruleid=r.id
WHERE r.version IS NULL
AND r.bugid=c.id)
FROM bug AS c
WHERE c.version IS NULL
AND c.id<10;
+----+--------------------+-------+-------+---------------+--------+---------+-------------+--------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------+--------+---------+-------------+--------+-----------------------+
| 1 | PRIMARY | c | range | id_2,id | id_2 | 8 | NULL | 3 | Using index condition |
| 2 | DEPENDENT SUBQUERY | r | ref | id_2,id,bugid | bugid | 8 | bugapp.c.id | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | t | ref | ruleid | ruleid | 9 | bugapp.r.id | 713002 | Using index |
+----+--------------------+-------+-------+---------------+--------+---------+-------------+--------+-----------------------+
3 rows in set (0.00 sec)
CREATE TABLE
statements (reduced by irrelevant columns) are: CREATE TABLE
语句(由不相关的列减少)是:
mysql> show create table bug;
CREATE TABLE `bug` (
`id` bigint(20) NOT NULL,
`version` int(11) DEFAULT NULL,
`state` varchar(16) DEFAULT NULL,
UNIQUE KEY `id_2` (`id`,`version`),
KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
mysql> show create table bugrule;
CREATE TABLE `bugrule` (
`id` bigint(20) NOT NULL,
`version` int(11) DEFAULT NULL,
`bugid` bigint(20) NOT NULL,
UNIQUE KEY `id_2` (`id`,`version`),
KEY `id` (`id`),
KEY `bugid` (`bugid`),
CONSTRAINT `bugrule_ibfk_1` FOREIGN KEY (`bugid`) REFERENCES `bug` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
mysql> show create table bugtrace;
CREATE TABLE `bugtrace` (
`id` bigint(20) NOT NULL,
`ruleid` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ruleid` (`ruleid`),
CONSTRAINT `bugtrace_ibfk_1` FOREIGN KEY (`ruleid`) REFERENCES `bugrule` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
You ask why about query optimization for a couple of complex queries with COUNT(DISTINCT val)
and dependent subqueries.您会问为什么要对具有
COUNT(DISTINCT val)
和相关子查询的几个复杂查询进行查询优化。 It's hard to know why for sure.很难确定为什么。
You will probably fix most of your performance problem by getting rid of your dependent subquery, though.不过,您可能会通过摆脱依赖子查询来解决大部分性能问题。 Try something like this:
尝试这样的事情:
SELECT c.id,state, cnt.cnt
FROM bug AS c
LEFT JOIN (
SELECT bugid, COUNT(DISTINCT id) cnt
FROM bugtracev
GROUP BY bugid
) cnt ON c.id = cnt.bugid
WHERE c.version IS NULL
AND c.id<10;
Why does this help?为什么这有帮助? To satisfy the query the optimizer can choose to run the
GROUP BY
subquery just once, rather than many times.为了满足查询,优化器可以选择只运行一次
GROUP BY
子查询,而不是多次。 And, you can use EXPLAIN
on the GROUP BY
subquery to understand its performance.而且,您可以在
GROUP BY
子查询上使用EXPLAIN
来了解其性能。
You may also get a performance boost by creating a compound index on bugrule
that matches the query in your view.您还可以通过在
bugrule
上创建与视图中的查询匹配的复合索引来提高性能。 Try this one.试试这个。
CREATE INDEX bugrule_v ON bugrule (version, ruleid, bugid)
and try switching the last two columns like so并尝试像这样切换最后两列
CREATE INDEX bugrule_v ON bugrule (version, ruleid, bugid)
These indexes are called covering indexes because they contain all the columns needed to satisfy your query.这些索引称为覆盖索引,因为它们包含满足查询所需的所有列。
version
appears first because that helps optimize WHERE version IS NULL
in your view definition. version
首先出现,因为这有助于优化视图定义中的WHERE version IS NULL
。 That makes it faster.这使它更快。
Pro tip: Avoid using SELECT *
in views and queries, especially when you have performance problems.专业提示:避免在视图和查询中使用
SELECT *
,尤其是当您遇到性能问题时。 Instead, list the columns you actually need.相反,列出您实际需要的列。 The
*
may force the query optimizer to avoid a covering index, even when the index would help. *
可能会强制查询优化器避免覆盖索引,即使索引会有所帮助。
When using MySQL 5.6 (or older), try with at least MySQL 5.7.使用 MySQL 5.6(或更早版本)时,请尝试至少使用 MySQL 5.7。 According to What's New in MySQL 5.7?
根据MySQL 5.7 中的新增功能? :
:
We have to a large extent unified the handling of derived tables and views.
我们必须在很大程度上统一对派生表和视图的处理。 Until now, subqueries in the FROM clause (derived tables) were unconditionally materialized, while views created from the same query expressions were sometimes materialized and sometimes merged into the outer query.
到目前为止,FROM 子句(派生表)中的子查询是无条件物化的,而从相同查询表达式创建的视图有时会被物化,有时会合并到外部查询中。 This behavior, beside being inconsistent, can lead to a serious performance penalty.
这种行为除了不一致之外,还会导致严重的性能损失。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.