[英]MySql group by optimization - avoid tmp table and/or filesort
我的查詢速度很慢,沒有按[group by](0.1-0.3秒)的速度進行分組,但是(按要求)分組的持續時間約為10-15s。
該查詢連接兩個表,事件(近5000萬行)和events_locations(500萬行)。
查詢:
SELECT `e`.`id` AS `event_id`,`e`.`time_stamp` AS `time_stamp`,`el`.`latitude` AS `latitude`,`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,`e`.`entity_id` AS `asset_name`, `el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id`AS `entity_type_id`, el.some_id
FROM events e
INNER JOIN events_locations el ON el.event_id = e.id
WHERE 1=1
AND el.other_id = '1'
AND time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
GROUP BY `e`.`event_type_id` , `el`.`some_id` , `el`.`group_alias`;
表事件:
CREATE TABLE `events` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event_type_id` int(11) NOT NULL,
`entity_type_id` int(11) NOT NULL,
`entity_id` varchar(64) NOT NULL,
`alias` varchar(64) NOT NULL,
`time_stamp` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `entity_id` (`entity_id`),
KEY `event_type_idx` (`event_type_id`),
KEY `idx_events_time_stamp` (`time_stamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
表events_locations
CREATE TABLE `events_locations` (
`event_id` bigint(20) NOT NULL,
`latitude` double NOT NULL,
`longitude` double NOT NULL,
`some_id` bigint(20) DEFAULT NULL,
`other_id` bigint(20) DEFAULT NULL,
`time_span` bigint(20) DEFAULT NULL,
`group_alias` varchar(64) NOT NULL,
KEY `some_id_idx` (`some_id`),
KEY `idx_events_group_alias` (`group_alias`),
KEY `idx_event_id` (`event_id`),
CONSTRAINT `fk_event_id` FOREIGN KEY (`event_id`) REFERENCES `events` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
解釋:
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| 1 | SIMPLE | ea | ALL | 'idx_event_id' | NULL | NULL | NULL | 5152834 | 'Using where; Using temporary; Using filesort' |
| 1 | SIMPLE | e | eq_ref | 'PRIMARY,idx_events_time_stamp' | PRIMARY | '8' | 'name.ea.event_id' | 1 | |
+----+-------------+----------------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
2 rows in set (0.08 sec)
從文檔 :
可以在以下條件下創建臨時表:
如果有一個ORDER BY子句和另一個GROUP BY子句,或者ORDER BY或GROUP BY包含聯接隊列中第一個表以外的表中的列,則會創建一個臨時表。
DISTINCT與ORDER BY結合使用可能需要一個臨時表。
如果使用SQL_SMALL_RESULT選項,則MySQL使用內存中臨時表,除非查詢還包含需要磁盤存儲的元素(稍后描述)。
我已經嘗試過:
el
創建索引。 some_id
, el
。 group_alias
' 對於性能調整的任何建議將不勝感激!
在您的情況下, events
表具有time_span
作為索引屬性。 因此,在加入兩個表之前,首先從events
表中為特定日期范圍選擇具有所需詳細信息的所需記錄。 然后使用表關系屬性加入event_location
。
檢查您的MySql Explain
關鍵字,以檢查您如何處理表記錄。 它會告訴您在選擇所需記錄之前要掃描多少行。
掃描的行數也涉及查詢執行時間。 使用我的以下邏輯減少掃描的行數。
SELECT
`e`.`id` AS `event_id`,
`e`.`time_stamp` AS `time_stamp`,
`el`.`latitude` AS `latitude`,
`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,
`e`.`entity_id` AS `asset_name`,
`el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,
`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id` AS `entity_type_id`,
`el`.`some_id` as `some_id`
FROM
(select
`id` AS `event_id`,
`time_stamp` AS `time_stamp`,
`entity_id` AS `asset_name`,
`event_type_id` AS `event_type_id`,
`entity_type_id` AS `entity_type_id`
from
`events`
WHERE
time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
) AS `e`
JOIN `events_locations` `el` ON `e`.`event_id` = `el`.`event_id`
WHERE
`el`.`other_id` = '1'
GROUP BY
`e`.`event_type_id` ,
`el`.`some_id` ,
`el`.`group_alias`;
這些表之間的關系是1:1,所以,我問我為什么要按要求分組,我發現了一些重復的行,即50000行中有200行。 因此,以某種方式,我的系統正在插入重復項,並且有人(幾年前)按該組放置而不是查找錯誤。
所以,我或多或少會將此標記為已解決...
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.