[英]Optimize MySQL range query with group by
我有一張桌子,上面有每天的溫度(巨大的桌子)和一張有周期開始和結束日期的桌子(小桌子)。 現在我想知道每個時期的平均溫度,但查詢需要很長時間。 可以改進嗎?
注意:升級到版本5.6.19-1~exp1ubuntu2后,長響應時間消失,並且可能是由5.6.8之前的MySQL版本中的錯誤引起的(請參閱Quassnoi的評論)
要使用隨機數據重建日期和期間表:
create table days (
day int not null auto_increment primary key,
temperature float not null);
insert into days values(null,rand()),(null,rand()),
(null,rand()),(null,rand()),(null,rand()),(null,rand()),
(null,rand()),(null,rand()); # 8 rows
insert into days select null, d1.temperature
from days d1, days d2, days d3, days d4,
days d5, days d6, days d7; # 2M rows
create table periods(id int not null auto_increment primary key,
first int not null,
last int not null,
index(first) using btree,
index(last) using btree,
index(first,last) using btree);
# add 10 periods of 1-11 days each
insert into periods(first,last)
select floor(rand(day)*2000000), floor(rand(day)*2000000 + rand()*10)
from days limit 10;
列出每個時期的全天溫度都沒有問題(以1ms為單位返回):
select id, temperature
from periods join days on day >= first and day <= last;
現在,使用GROUP BY,它實際上非常慢(~1750ms)
# ALT1
select id, avg(temperature)
from periods join days on day >= first and day <= last group by id;
用BETWEEN替換<=和> =會稍微加快(~1600ms):
# ALT2
select id, avg(temperature)
from periods join days on day between first and last group by id;
事實證明,單個句點的結果會立即返回(1ms):
select id, (select avg(temperature)
from days where day >= first and day <= last) from periods
where id=1;
但是,如果沒有WHERE,則需要4200 ms,平均每個周期為420 ms!
# ALT3
select id,
(select avg(temperature) from days where day >= first and day <= last)
from periods;
是什么讓查詢如此緩慢 - 甚至(很多)比單個句點的結果慢10倍以上,盡管句號表只有10行? 有沒有辦法優化這個查詢?
編輯:更多信息:
mysql> select @@version;
+-------------------------+
| @@version |
+-------------------------+
| 5.5.41-0ubuntu0.14.04.1 |
+-------------------------+
# ALT1
mysql> explain select id, avg(temperature) from periods join days on day >= first and day <= last group by id;
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | periods | index | first,last,first_2 | first_2 | 8 | NULL | 10 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | days | ALL | PRIMARY | NULL | NULL | NULL | 2097596 | Using where; Using join buffer |
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+----------------------------------------------+
# ALT1 without GROUP BY
mysql> explain select id, temperature from periods join days on day >= first and day <= last;
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | periods | index | first,last,first_2 | first_2 | 8 | NULL | 10 | Using index |
| 1 | SIMPLE | days | ALL | PRIMARY | NULL | NULL | NULL | 2097596 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+------------------------------------------------+
# ALT2
mysql> explain select id, avg(temperature) from periods join days on day between first and last group by id;
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | periods | index | first,last,first_2 | first_2 | 8 | NULL | 10 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | days | ALL | PRIMARY | NULL | NULL | NULL | 2097596 | Using where; Using join buffer |
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+----------------------------------------------+
# ALT3
mysql> explain select id, (select avg(temperature) from days where day >= first and day <= last) from periods;
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+-------------+
| 1 | PRIMARY | periods | index | NULL | first_2 | 8 | NULL | 10 | Using index |
| 2 | DEPENDENT SUBQUERY | days | ALL | PRIMARY | NULL | NULL | NULL | 2097596 | Using where |
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+-------------+
# ALT3 with where
mysql> explain select id, (select avg(temperature) from days where day >= first and day <= last) from periods where id = 1;
+----+--------------------+---------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | PRIMARY | periods | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 2 | DEPENDENT SUBQUERY | days | range | PRIMARY | PRIMARY | 4 | NULL | 10 | Using where |
+----+--------------------+---------+-------+---------------+---------+---------+-------+------+-------------+
EDIT2: FROM中嵌套查詢的執行計划,由Lennart建議(查詢執行時間3ms)
mysql> explain select id,avg(temperature) from (select id,temperature from periods join days on day between first and last) as t group by id;
+----+-------------+------------+-------+--------------------+---------+---------+------+----------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+--------------------+---------+---------+------+----------+------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 50 | Using temporary; Using filesort |
| 2 | DERIVED | periods | index | first,last,first_2 | first_2 | 8 | NULL | 10 | Using index |
| 2 | DERIVED | days | range | PRIMARY,day | PRIMARY | 4 | NULL | 5 | Range checked for each record (index map: 0x3) |
+----+-------------+------------+-------+--------------------+---------+---------+------+----------+------------------------------------------------+
這是一個丑陋的伎倆,因為:
select id, temperature
from periods join days
on day between first and last;
很快,我們可以嘗試激發優化器來首先評估它。 僅使用子查詢是不夠的:
select id, avg(temperature)
from (
select id, temperature
from periods
join days
on day between first and last
) as t
group by id;
[...]
10 rows in set (1.67 sec)
但是,在子查詢中調用非確定性函數似乎可以解決這個問題:
select id, avg(temperature)
from (
select id, temperature, rand()
from periods
join days
on day between first and last
) as t
group by id;
[...]
10 rows in set (0.00 sec)
除非是關鍵和必要的,否則我會遠離這些伎倆。 隨着優化器變得更好(可能是下一個修復),它可能會跳過對rand()的調用,突然之間您的舊計划和性能又重新開始。
如果您使用此類技巧,請務必在代碼中仔細記錄它們,以便在不再需要時對其進行清理。
MariaDB [test]> select @@version;
+-----------------+
| @@version |
+-----------------+
| 10.0.20-MariaDB |
+-----------------+
1 row in set (0.00 sec)
explain select id, avg(temperature) from periods join days on day between first and last group by id;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | periods | index | first,last,first_2 | first_2 | 8 | NULL | 10 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | days | ALL | PRIMARY | NULL | NULL | NULL | 2094315 | Using where; Using join buffer (flat, BNL join) |
explain select id, avg(temperature) from (select id, temperature from periods join days on day between first and last) as t group by id;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |
| 1 | SIMPLE | periods | index | first,last,first_2 | first_2 | 8 | NULL | 10 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | days | ALL | PRIMARY | NULL | NULL | NULL | 2094315 | Using where; Using join buffer (flat, BNL join) |
explain select id, avg(temperature) from (select id, temperature, rand() from periods join days on day between first and last) as t group by id;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | Using temporary; Using filesort |
| 2 | DERIVED | periods | index | first,last,first_2 | first_2 | 8 | NULL | 10 | Using index |
| 2 | DERIVED | days | ALL | PRIMARY | NULL | NULL | NULL | 2094315 | Range checked for each record (index map: 0x1) |
嘗試在days (day,temperature)
上創建覆蓋指數的化合物。 它應該提高你的速度。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.