简体   繁体   中英

Optimise mysql query with group by

I have an InnoDB table with 11 columns and around 5 million records in which I am using a query to find the top 10 records with the highest sum. The table schema is as below.

id (int 11) (primary key)
activity_id(varchar 250)
activity_type (varchar 10)
advertised_time (timestamp)
advertised_train_ident(int 11)
technical_train_ident(int 11)
location_signature(varchar 10)
time_at_location(timestamp)
information_owner(varchar 100)
created_at(timestamp)
updated_at(timestamp)

The indexes present in the table are

id - primary key
location_signature,activity_type, advertised_time - composite index (name is search)

I am using the following query to pull records from the above table and it takes 10-12 seconds to complete the execution.

SELECT location_signature, activity_type,  
SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 THEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) else 0 END) as delay_time, 
count(id) as total_train_count, 
SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 THEN 1 ELSE 0 END) as delayed_train_count 
from `train_announcements` 
where `advertised_time` >= '2019-04-01 10:00:00' and `advertised_time` <= '2019-04-30 10:00:00' 
group by `location_signature`, `activity_type` 
order by `delay_time` desc 
limit 10 offset 0;

The Explain statement of this query is as follows

+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
| id | select_type | table                      | type  | possible_keys | key     | key_len | ref  | rows   | Extra                                        |
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+
|  1 | SIMPLE      | train_announcements        | index | search        | search  | 84      | NULL | 4910024| Using where; Using temporary; Using filesort |
+----+-------------+----------------------------+-------+---------------+---------+---------+------+--------+----------------------------------------------+

Please note that the collation of this table is utf8mb4_unicode_ci because of the field location_signature contains special characters.

It would be great if someone can suggest any workarounds to improve the performance of this query. Thanks in advance.

Looking to your index be sure you have advertised_time at top left

and could be useful add also the time_at_location foe avoid access to data table and use data from index

index for table train_announcements

columns (advertised_time, location_signature,activity_type, time_at_location)

SELECT location_signature
  , activity_type
  , SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 
            THEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) 
            ELSE 0 END) as delay_time
  , count(id) as total_train_count
  , SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 
            THEN 1 
            ELSE 0 END) as delayed_train_count 
from `train_announcements` 
where `advertised_time` BETWEEN '2019-04-01 10:00:00' and '2019-04-30 10:00:00' 
group by `location_signature`, `activity_type` 
order by `delay_time` desc 
limit 10 offset 0;

and if you have not id with null value the try using count(*) instead of count(id)

SELECT location_signature
  , activity_type
  , SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 
            THEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) 
            ELSE 0 END) as delay_time
  , count(*) as total_train_count
  , SUM(CASE WHEN TIMESTAMPDIFF(MINUTE,advertised_time, time_at_location) > 0 
            THEN 1 
            ELSE 0 END) as delayed_train_count 
from `train_announcements` 
where `advertised_time` BETWEEN '2019-04-01 10:00:00' and '2019-04-30 10:00:00' 
group by `location_signature`, `activity_type` 
order by `delay_time` desc 
limit 10 offset 0;

or if you really need id too try add this column to the composite index

      (advertised_time, location_signature, activity_type, time_at_location, id )

Build and maintain a Summary table. It would have subtotals for, say, each day. Then the 'report' would be against this much smaller table, hence would be much faster.

More: http://mysql.rjweb.org/doc.php/summarytables

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM