使用group by優化MySQL范圍查詢

Question

我有一張桌子，上面有每天的溫度（巨大的桌子）和一張有周期開始和結束日期的桌子（小桌子）。 現在我想知道每個時期的平均溫度，但查詢需要很長時間。 可以改進嗎？

注意：升級到版本5.6.19-1~exp1ubuntu2后，長響應時間消失，並且可能是由5.6.8之前的MySQL版本中的錯誤引起的（請參閱Quassnoi的評論）

要使用隨機數據重建日期和期間表：

create table days (
  day int not null auto_increment primary key,
  temperature float not null);

insert into days values(null,rand()),(null,rand()),
  (null,rand()),(null,rand()),(null,rand()),(null,rand()),
  (null,rand()),(null,rand()); # 8 rows

insert into days select null, d1.temperature
  from days d1, days d2, days d3, days d4,
  days d5, days d6, days d7; # 2M rows

create table periods(id int not null auto_increment primary key,
  first int not null,
  last int not null,
  index(first) using btree,
  index(last) using btree,
  index(first,last) using btree);

# add 10 periods of 1-11 days each
insert into periods(first,last)
  select floor(rand(day)*2000000), floor(rand(day)*2000000 + rand()*10)
  from days limit 10;

列出每個時期的全天溫度都沒有問題（以1ms為單位返回）：

select id, temperature
  from periods join days on day >= first and day <= last;

現在，使用GROUP BY，它實際上非常慢（~1750ms）

# ALT1
select id, avg(temperature)
  from periods join days on day >= first and day <= last group by id;

用BETWEEN替換<=和> =會稍微加快（~1600ms）：

# ALT2
select id, avg(temperature)
  from periods join days on day between first and last group by id;

事實證明，單個句點的結果會立即返回（1ms）：

select id, (select avg(temperature)
  from days where day >= first and day <= last) from periods
  where id=1;

但是，如果沒有WHERE，則需要4200 ms，平均每個周期為420 ms！

# ALT3
select id,
  (select avg(temperature) from days where day >= first and day <= last)
  from periods;

是什么讓查詢如此緩慢 - 甚至（很多）比單個句點的結果慢10倍以上，盡管句號表只有10行？ 有沒有辦法優化這個查詢？

編輯：更多信息：

mysql> select @@version;
+-------------------------+
| @@version               |
+-------------------------+
| 5.5.41-0ubuntu0.14.04.1 |
+-------------------------+

# ALT1
mysql> explain select id, avg(temperature) from periods join days on day >= first and day <= last group by id;
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+----------------------------------------------+
| id | select_type | table   | type  | possible_keys      | key     | key_len | ref  | rows    | Extra                                        |
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+----------------------------------------------+
|  1 | SIMPLE      | periods | index | first,last,first_2 | first_2 | 8       | NULL |      10 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | days    | ALL   | PRIMARY            | NULL    | NULL    | NULL | 2097596 | Using where; Using join buffer               |
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+----------------------------------------------+

# ALT1 without GROUP BY
mysql> explain select id, temperature from periods join days on day >= first and day <= last;
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+------------------------------------------------+
| id | select_type | table   | type  | possible_keys      | key     | key_len | ref  | rows    | Extra                                          |
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+------------------------------------------------+
|  1 | SIMPLE      | periods | index | first,last,first_2 | first_2 | 8       | NULL |      10 | Using index                                    |
|  1 | SIMPLE      | days    | ALL   | PRIMARY            | NULL    | NULL    | NULL | 2097596 | Range checked for each record (index map: 0x1) |
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+------------------------------------------------+

# ALT2
mysql> explain select id, avg(temperature) from periods join days on day between first and last group by id;
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+----------------------------------------------+
| id | select_type | table   | type  | possible_keys      | key     | key_len | ref  | rows    | Extra                                        |
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+----------------------------------------------+
|  1 | SIMPLE      | periods | index | first,last,first_2 | first_2 | 8       | NULL |      10 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | days    | ALL   | PRIMARY            | NULL    | NULL    | NULL | 2097596 | Using where; Using join buffer               |
+----+-------------+---------+-------+--------------------+---------+---------+------+---------+----------------------------------------------+

# ALT3
mysql> explain select id, (select avg(temperature) from days where day >= first and day <= last) from periods;
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+-------------+
| id | select_type        | table   | type  | possible_keys | key     | key_len | ref  | rows    | Extra       |
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+-------------+
|  1 | PRIMARY            | periods | index | NULL          | first_2 | 8       | NULL |      10 | Using index |
|  2 | DEPENDENT SUBQUERY | days    | ALL   | PRIMARY       | NULL    | NULL    | NULL | 2097596 | Using where |
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+-------------+

# ALT3 with where
mysql> explain select id, (select avg(temperature) from days where day >= first and day <= last) from periods where id = 1;
+----+--------------------+---------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type        | table   | type  | possible_keys | key     | key_len | ref   | rows | Extra       |
+----+--------------------+---------+-------+---------------+---------+---------+-------+------+-------------+
|  1 | PRIMARY            | periods | const | PRIMARY       | PRIMARY | 4       | const |    1 |             |
|  2 | DEPENDENT SUBQUERY | days    | range | PRIMARY       | PRIMARY | 4       | NULL  |   10 | Using where |
+----+--------------------+---------+-------+---------------+---------+---------+-------+------+-------------+

EDIT2： FROM中嵌套查詢的執行計划，由Lennart建議（查詢執行時間3ms）

mysql> explain select id,avg(temperature) from (select id,temperature from periods join days on day between first and last) as t group by id;
+----+-------------+------------+-------+--------------------+---------+---------+------+----------+------------------------------------------------+
| id | select_type | table      | type  | possible_keys      | key     | key_len | ref  | rows     | Extra                                          |
+----+-------------+------------+-------+--------------------+---------+---------+------+----------+------------------------------------------------+
|  1 | PRIMARY     | <derived2> | ALL   | NULL               | NULL    | NULL    | NULL |       50 | Using temporary; Using filesort                |
|  2 | DERIVED     | periods    | index | first,last,first_2 | first_2 | 8       | NULL |       10 | Using index                                    |
|  2 | DERIVED     | days       | range | PRIMARY,day        | PRIMARY | 4       | NULL |        5 | Range checked for each record (index map: 0x3) |
+----+-------------+------------+-------+--------------------+---------+---------+------+----------+------------------------------------------------+

Answer 1

這是一個丑陋的伎倆，因為：

select id, temperature 
from periods join days 
    on day between first and last;

很快，我們可以嘗試激發優化器來首先評估它。 僅使用子查詢是不夠的：

select id, avg(temperature) 
from (
    select id, temperature 
    from periods 
    join days 
        on day between first and last
) as t 
group by id;
[...]
10 rows in set (1.67 sec)

但是，在子查詢中調用非確定性函數似乎可以解決這個問題：

select id, avg(temperature) 
from (
    select id, temperature, rand() 
    from periods 
    join days 
        on day between first and last
) as t 
group by id;
[...]
10 rows in set (0.00 sec)

除非是關鍵和必要的，否則我會遠離這些伎倆。 隨着優化器變得更好（可能是下一個修復），它可能會跳過對rand（）的調用，突然之間您的舊計划和性能又重新開始。

如果您使用此類技巧，請務必在代碼中仔細記錄它們，以便在不再需要時對其進行清理。

MariaDB [test]> select @@version;
+-----------------+
| @@version       |
+-----------------+
| 10.0.20-MariaDB |
+-----------------+
1 row in set (0.00 sec)

explain select id, avg(temperature) from periods join days on day between first and last group by id;
| id   | select_type | table   | type  | possible_keys      | key     | key_len | ref  | rows    | Extra                                           |
|    1 | SIMPLE      | periods | index | first,last,first_2 | first_2 | 8       | NULL |      10 | Using index; Using temporary; Using filesort    |
|    1 | SIMPLE      | days    | ALL   | PRIMARY            | NULL    | NULL    | NULL | 2094315 | Using where; Using join buffer (flat, BNL join) |

explain select id, avg(temperature) from (select id, temperature from periods join days on day between first and last) as t group by id;
| id   | select_type | table   | type  | possible_keys      | key     | key_len | ref  | rows    | extra                                           |
|    1 | SIMPLE      | periods | index | first,last,first_2 | first_2 | 8       | NULL |      10 | Using index; Using temporary; Using filesort    |
|    1 | SIMPLE      | days    | ALL   | PRIMARY            | NULL    | NULL    | NULL | 2094315 | Using where; Using join buffer (flat, BNL join) |


explain select id, avg(temperature) from (select id, temperature, rand() from periods join days on day between first and last) as t group by id;
| id   | select_type | table      | type  | possible_keys      | key     | key_len | ref  | rows    | Extra                                          |
|    1 | PRIMARY     | <derived2> | ALL   | NULL               | NULL    | NULL    | NULL |       2 | Using temporary; Using filesort                |
|    2 | DERIVED     | periods    | index | first,last,first_2 | first_2 | 8       | NULL |      10 | Using index                                    |
|    2 | DERIVED     | days       | ALL   | PRIMARY            | NULL    | NULL    | NULL | 2094315 | Range checked for each record (index map: 0x1) |

Answer 2

嘗試在days (day,temperature)上創建覆蓋指數的化合物。 它應該提高你的速度。

使用group by優化MySQL范圍查詢

問題描述

2 個解決方案

解決方案1
2 已采納 2015-07-01 03:23:43

解決方案2
0 2015-06-30 17:27:13

使用group by優化MySQL范圍查詢

問題描述

2 個解決方案

解決方案1 2 已采納 2015-07-01 03:23:43

解決方案2 0 2015-06-30 17:27:13

解決方案1
2 已采納 2015-07-01 03:23:43

解決方案2
0 2015-06-30 17:27:13