简体   繁体   English

通过优化对MySQL分组-避免使用tmp表和/或文件排序

[英]MySql group by optimization - avoid tmp table and/or filesort

I have a slow query, without the group by is fast (0.1-0.3 seconds), but with the (required) group by the duration is around 10-15s. 我的查询速度很慢,没有按[group by](0.1-0.3秒)的速度进行分组,但是(按要求)分组的持续时间约为10-15s。

The query joins two tables, events (near 50 million rows) and events_locations (5 million rows). 该查询连接两个表,事件(近5000万行)和events_locations(500万行)。

Query: 查询:

SELECT  `e`.`id` AS `event_id`,`e`.`time_stamp` AS `time_stamp`,`el`.`latitude` AS `latitude`,`el`.`longitude` AS `longitude`,
        `el`.`time_span` AS `extra`,`e`.`entity_id` AS `asset_name`, `el`.`other_id` AS `geozone_id`,
        `el`.`group_alias` AS `group_alias`,`e`.`event_type_id` AS `event_type_id`,
        `e`.`entity_type_id`AS `entity_type_id`, el.some_id
FROM events e
INNER JOIN events_locations el ON el.event_id = e.id
WHERE 1=1       
    AND el.other_id = '1'  
    AND time_stamp >= '2018-01-01'  
    AND time_stamp <= '2019-06-02'
GROUP BY `e`.`event_type_id` , `el`.`some_id` , `el`.`group_alias`;

Table events: 表事件:

CREATE TABLE `events` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `event_type_id` int(11) NOT NULL,
  `entity_type_id` int(11) NOT NULL,
  `entity_id` varchar(64) NOT NULL,
  `alias` varchar(64) NOT NULL,
  `time_stamp` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `entity_id` (`entity_id`),
  KEY `event_type_idx` (`event_type_id`),
  KEY `idx_events_time_stamp` (`time_stamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Table events_locations 表events_locations

CREATE TABLE `events_locations` (
  `event_id` bigint(20) NOT NULL,
  `latitude` double NOT NULL,
  `longitude` double NOT NULL,
  `some_id` bigint(20) DEFAULT NULL,
  `other_id` bigint(20) DEFAULT NULL,
  `time_span` bigint(20) DEFAULT NULL,
  `group_alias` varchar(64) NOT NULL,
  KEY `some_id_idx` (`some_id`),
  KEY `idx_events_group_alias` (`group_alias`),
  KEY `idx_event_id` (`event_id`),
  CONSTRAINT `fk_event_id` FOREIGN KEY (`event_id`) REFERENCES `events` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The explain: 解释:

+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| id | select_type | table | type   | possible_keys                   | key     | key_len | ref                                       | rows     | Extra                                          |
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| 1  | SIMPLE      | ea    | ALL    | 'idx_event_id'                  | NULL    | NULL    | NULL                                      | 5152834  | 'Using where; Using temporary; Using filesort' |
| 1  | SIMPLE      | e     | eq_ref | 'PRIMARY,idx_events_time_stamp' | PRIMARY | '8'     | 'name.ea.event_id'                        | 1        |                                                |
+----+-------------+----------------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
2 rows in set (0.08 sec)

From the doc : 文档

Temporary tables can be created under conditions such as these: 可以在以下条件下创建临时表:

If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created. 如果有一个ORDER BY子句和另一个GROUP BY子句,或者ORDER BY或GROUP BY包含联接队列中第一个表以外的表中的列,则会创建一个临时表。

DISTINCT combined with ORDER BY may require a temporary table. DISTINCT与ORDER BY结合使用可能需要一个临时表。

If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements (described later) that require on-disk storage. 如果使用SQL_SMALL_RESULT选项,则MySQL使用内存中临时表,除非查询还包含需要磁盘存储的元素(稍后描述)。

I already tried: 我已经尝试过:

  • Create an index by ' el . 通过' el创建索引。 some_id , el . some_idel group_alias ' group_alias '
  • Decrease the varchar size to 20 将varchar大小减小到20
  • Increase the size of sort_buffer_size and read_rnd_buffer_size; 增加sort_buffer_size和read_rnd_buffer_size的大小;

Any suggestions for performance tuning would be much appreciated! 对于性能调整的任何建议将不胜感激!

In your case events table has time_span as indexing property. 在您的情况下, events表具有time_span作为索引属性。 So before joining both tables first select required records from events table for specific date range with required details. 因此,在加入两个表之前,首先从events表中为特定日期范围选择具有所需详细信息的所需记录。 Then join the event_location by using table relation properties. 然后使用表关系属性加入event_location

Check your MySql Explain keyword to check how does your approach your table records. 检查您的MySql Explain关键字,以检查您如何处理表记录。 It will tell you how much rows are scanned for before selecting required records. 它会告诉您在选择所需记录之前要扫描多少行。

Number of rows that are scanned also involve in query execution time. 扫描的行数也涉及查询执行时间。 Use my below logic to reduce the number of rows that are scanned. 使用我的以下逻辑减少扫描的行数。

SELECT  
    `e`.`id` AS `event_id`,
    `e`.`time_stamp` AS `time_stamp`,
    `el`.`latitude` AS `latitude`,
    `el`.`longitude` AS `longitude`,
    `el`.`time_span` AS `extra`,
    `e`.`entity_id` AS `asset_name`, 
    `el`.`other_id` AS `geozone_id`,
    `el`.`group_alias` AS `group_alias`,
    `e`.`event_type_id` AS `event_type_id`,
    `e`.`entity_type_id` AS `entity_type_id`, 
    `el`.`some_id` as `some_id`
FROM 
    (select
        `id` AS `event_id`,
        `time_stamp` AS `time_stamp`,
        `entity_id` AS `asset_name`,
        `event_type_id` AS `event_type_id`,
        `entity_type_id` AS `entity_type_id`
    from
        `events` 
    WHERE
        time_stamp >= '2018-01-01'  
        AND time_stamp <= '2019-06-02'
    ) AS `e`    
    JOIN `events_locations` `el` ON `e`.`event_id` = `el`.`event_id`
WHERE     
    `el`.`other_id` = '1'      
GROUP BY 
    `e`.`event_type_id` , 
    `el`.`some_id` , 
    `el`.`group_alias`;

The relationship between these tables is 1:1, so, I asked me why is a group by required and I found some duplicated rows, 200 in 50000 rows. 这些表之间的关系是1:1,所以,我问我为什么要按要求分组,我发现了一些重复的行,即50000行中有200行。 So, somehow, my system is inserting duplicates and someone put that group by (years ago) instead of seek of the bug. 因此,以某种方式,我的系统正在插入重复项,并且有人(几年前)按该组放置而不是查找错误。

So, I will mark this as solved, more or less... 所以,我或多或少会将此标记为已解决...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM