简体   繁体   English

使用“group by”键中的函数优化查询?

[英]Optimize query with functions in `group by` key?

I am using MySQL 8.0 and there is a slow query on a large table to be optimized.我正在使用 MySQL 8.0 并且在要优化的大表上有一个缓慢的查询。

The table contains 11 million rows of data and it's structure:该表包含1100 万行数据及其结构:

CREATE TABLE `ccu` (
  `id` bigint NOT NULL,
  `app_id` int NOT NULL,
  `ccu` int NOT NULL,
  `audit_create` datetime NOT NULL,
  PRIMARY KEY (`id`) USING BTREE,
  UNIQUE KEY `ccu_game_create_time_2a10bc69_idx` (`app_id`,`audit_create`) USING BTREE,
  KEY `ccu_audit_create_idx` (`audit_create`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci

My query is:我的查询是:

SELECT app_id, DATE(audit_create) cal_day, MAX(ccu) pcu, ROUND(AVG(ccu)) id_acu 
FROM ccu
WHERE audit_create BETWEEN DATE_SUB(DATE(NOW()), INTERVAL 29 DAY) AND DATE(NOW())
GROUP BY app_id, DATE(audit_create)

The query runs over 2 seconds.查询运行超过 2 秒。 I add the condition by between... and... to filter useful data.我通过between... and...添加条件来过滤有用的数据。 However, the data stored in audit_create is in format yyyy-MM-dd HH:mm:ss , I have to use the date function but according to the execution plan only the where condition uses index(still has temporary table), the group by clause does not use any index at all.但是,存储在audit_create中的数据格式为yyyy-MM-dd HH:mm:ss ,我必须使用date function 但根据执行计划只有where条件使用索引(仍然有临时表), group by子句根本不使用任何索引。 在此处输入图像描述

I have no right to alter the table structre to add a date column.我无权更改表结构以添加日期列。 Is it possible to optimize the query to lower the query time?是否可以优化查询以降低查询时间?

I was able to eliminate the Using temporary by adding an expression index:我能够通过添加表达式索引来消除Using temporary

mysql> alter table ccu add key bk1 (app_id, (cast(audit_create as date)));
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain SELECT app_id, DATE(audit_create) cal_day, 
   MAX(ccu) pcu, ROUND(AVG(ccu)) id_acu  
 FROM ccu 
 WHERE date(audit_create) BETWEEN DATE_SUB(DATE(NOW()), INTERVAL 29 DAY) AND DATE(NOW()) 
 GROUP BY app_id, cast(audit_create as date)\G 
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ccu
   partitions: NULL
         type: index
possible_keys: bk1
          key: bk1
      key_len: 8
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: Using where

Unfortunately, that EXPLAIN report shows it will use type: index which is an index-scan, in other words it will examine every one of the 11 million index entries.不幸的是,EXPLAIN 报告显示它将使用type: index这是一个索引扫描,换句话说它将检查 1100 万个索引条目中的每一个。 It could make it worse than it was in your query.它可能会使它比您的查询更糟

The only other suggestion I have is to run this query once a day and store the result in a summary table.我唯一的其他建议是每天运行一次此查询并将结果存储在汇总表中。 Running a 2-second query once a day so you can get the aggregate results quickly should be acceptable.每天运行一次 2 秒的查询以便快速获得聚合结果应该是可以接受的。 But you said you don't have authority to add a column, so I guess you don't have authority to add a table either.但是你说你没有添加列的权限,所以我猜你也没有添加表的权限。

In that case, get a faster computer with more RAM.在这种情况下,买一台速度更快、内存更大的电脑。

Trivial improvement: DATE(NOW()) --> CURDATE()微不足道的改进: DATE(NOW()) --> CURDATE()

Main improvement:主要改进:

Get rid of id and change摆脱id并改变

PRIMARY KEY (`id`) USING BTREE,
UNIQUE KEY `ccu_game_create_time_2a10bc69_idx` (`app_id`,`audit_create`) USING BTREE,

to just只是

PRIMARY KEY (`app_id`,`audit_create`),

That avoids a secondary lookup for each row.这避免了对每一行进行二次查找。

There seem to be 2.4M rows (out of 11M) in the 29 day range.在 29 天的范围内似乎有 240 万行(共 1100 万行)。 The Optimizer had to decide whether to use the index (which it did), but suffer 2.4M extra lookups, versus scan all 11M rows, necessitating an extra sort.优化器必须决定是否使用索引(它确实这样做了),但要承受 240 万次额外查找,而不是扫描所有 1100 万行,从而需要额外排序。

Another thing to check is innodb_buffer_pool_size .另一件要检查的事情是innodb_buffer_pool_size If the table is so big that it won't fit in that cache, there may be a lot of I/O.如果表太大以至于无法放入该缓存,则可能有大量 I/O。 (Again, my index change will help with that.) (同样,我的索引更改将对此有所帮助。)

Yes, Bill's generated column is likely to add more performance, independently of my suggestion.是的,Bill 生成的列可能会增加更多的性能,这与我的建议无关。

Caution:警告:
Your range is 29 days + 1 second.您的范围是 29 天 + 1 秒。
Bill's range is 30 days. Bill 的范围是 30 天。

Regardless of the datatype of audit_create , this works to get exactly 29 days before this morning:无论audit_create的数据类型如何,这都可以在今天早上之前得到恰好 29 天:

WHERE audit_create >= CURDATE() - INTERVAL 29 DAY
  AND audit_create  < CURDATE()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM