简体   繁体   English

优化MySQL通过安全限制选择不同的顺序

[英]Optimizing MySQL select distinct order by limit safely

I have a problematic query that I know how to write faster, but technically the SQL is invalid and it has no guarantee of working correctly into the future. 我有一个有问题的查询,我知道如何更快地编写,但从技术上讲,SQL是无效的,它无法保证在未来正常工作。

The original, slow query looks like this: 原始的慢查询如下所示:

SELECT sql_no_cache DISTINCT r.field_1 value
FROM table_middle m
JOIN table_right r on r.id = m.id
WHERE ((r.field_1) IS NOT NULL) 
AND (m.kind IN ('partial')) 
ORDER BY r.field_1 
LIMIT 26

This takes about 37 seconds . 这大约需要37秒 Explain output: 解释输出:

+----+-------------+-------+--------+-----------------------+---------------+---------+---------+-----------------------------------------------------------+
| id | select_type | table | type   | possible_keys         | key           | key_len | rows    | Extra                                                     |
+----+-------------+-------+--------+-----------------------+---------------+---------+---------+-----------------------------------------------------------+
|  1 | SIMPLE      | r     | range  | PRIMARY,index_field_1 | index_field_1 | 9       | 1544595 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | m     | eq_ref | PRIMARY,index_kind    | PRIMARY       | 4       |       1 | Using where; Distinct                                     |
+----+-------------+-------+--------+-----------------------+---------------+---------+---------+-----------------------------------------------------------+

The faster version looks like this; 更快的版本看起来像这样; the order by clause is pushed down into a subquery, which is joined on and is in turn limited with distinct: order by子句被下推到一个子查询中,该子查询被连接起来,并且又被限制为不同的:

SELECT sql_no_cache DISTINCT value 
FROM (
  SELECT r.field_1 value
  FROM table_middle m
  JOIN table_right r ON r.id = m.id
  WHERE ((r.field_1) IS NOT NULL) 
  AND (m.kind IN ('partial')) 
  ORDER BY r.field_1 
) t
LIMIT 26

This takes about 2.7 seconds . 这大约需要2.7秒 Explain output: 解释输出:

+----+-------------+------------+--------+-----------------------+------------+---------+---------+-----------------------------------------------------------+
| id | select_type | table      | type   | possible_keys         | key        | key_len | rows    | Extra                                                     |
+----+-------------+------------+--------+-----------------------+------------+---------+---------+-----------------------------------------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL                  | NULL       | NULL    | 1346348 | Using temporary                                           |
|  2 | DERIVED     | m          | ref    | PRIMARY,index_kind    | index_kind | 99      | 1539558 | Using where; Using index; Using temporary; Using filesort |
|  2 | DERIVED     | r          | eq_ref | PRIMARY,index_field_1 | PRIMARY    | 4       |       1 | Using where                                               |
+----+-------------+------------+--------+-----------------------+------------+---------+---------+-----------------------------------------------------------+

There are three million rows in table_right and table_middle, and all mentioned columns are individually indexed. table_right和table_middle中有三百万行,所有提到的列都是单独索引的。 The query should be read as having an arbitrary where clause - it's dynamically generated. 该查询应该被理解为具有任意的where子句 - 它是动态生成的。 The query can't be rewritten in any way that prevents the where clause being easily replaced, and similarly the indexes can't be changed - MySQL doesn't support enough indexes for the number of potential filter field combinations. 无法以任何方式重写查询以防止where子句被轻易替换,并且类似地无法更改索引 - MySQL不支持足够的索引来处理潜在的过滤器字段组合的数量。

Has anyone seen this problem before - specifically, select / distinct / order by / limit performing very poorly - and is there another way to write the same query with good performance that doesn't rely on unspecified implementation behaviour? 之前有没有人见过这个问题 - 具体来说,选择/ distinct / order by / limit执行得非常糟糕 - 还有另一种方法来编写具有良好性能但不依赖于未指定实现行为的相同查询吗?

(AFAIK MariaDB, for example, ignores order by in a subquery because it should not logically affect the set-theoretic semantics of the query.) (例如,AFAIK MariaDB忽略了子查询中的order by因为它不应该在逻辑上影响查询的集合理论语义。)

For the more incredulous 对于更加不相信的

Here's how you can create a version of database for yourself! 以下是如何为自己创建数据库版本的方法! This should output a SQL script you can run with mysql command-line client: 这应该输出一个可以用mysql命令行客户端运行的SQL脚本:

#!/usr/bin/env ruby
puts "create database testy;"
puts "use testy;"
puts "create table table_right(id int(11) not null primary key, field_0 int(11), field_1 int(11), field_2 int(11));"
puts "create table table_middle(id int(11) not null primary key, field_0 int(11), field_1 int(11), field_2 int(11));"
puts "begin;"
3_000_000.times do |x|
  puts "insert into table_middle values (#{x},#{x*10},#{x*100},#{x*1000});"
  puts "insert into table_right values (#{x},#{x*10},#{x*100},#{x*1000});"
end
puts "commit;"

Indexes aren't important for reproducing the effect. 索引对于再现效果并不重要。 The script above is untested; 上面的脚本未经测试; it's an approximation of a pry session I had when reproducing the problem manually. 这是我手动重现问题时的撬开会话的近似值。

Replace the m.kind in ('partial') with m.field_1 > 0 or something similar that's trivially true. m.kind in ('partial')m.kind in ('partial')替换为m.kind in ('partial') m.field_1 > 0或类似的真正的m.field_1 > 0 Observe the large difference in performance between the two different techniques, and how the sorting semantics are preserved (tested using MySQL 5.5). 观察两种不同技术之间在性能上的巨大差异,以及如何保留排序语义(使用MySQL 5.5进行测试)。 The unreliability of the semantics are, of course, precisely the reason I'm asking the question. 当然,语义的不可靠性正是我提出问题的原因。

Please provide SHOW CREATE TABLE . 请提供SHOW CREATE TABLE In the absence of that, I will guess that these are missing and may be useful: 如果没有,我会猜测这些都缺失了,可能有用:

m:  (kind, id)
r:  (field_1, id)

You can turn off MariaDB's ignoring of the subquery's ORDER BY . 您可以关闭MariaDB忽略子查询的ORDER BY

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM