简体   繁体   中英

Optimizing query with GROUP BY to remove Using Temporary; Using Filesort

I am using mySQL 5.6.13.2 and have a query that involves 150,000 rows in a parent table with over 1M rows in a child table. The query takes 2 seconds if I remove the GROUP BY (just as a test) and over 6 seconds if I have the GROUP BY, which is needed.

I've read other posts about how to remove using temporary; using filesort but these do not address the issue. I'm hoping to get some help here please.

A SQL fiddle that demonstrates all this is available here: http://sqlfiddle.com/#!9/edeb6/1

CREATE TABLE `summary` (
   `RunID` int(10) unsigned NOT NULL AUTO_INCREMENT,
   `LastUpdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
   `FileName` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
   `XCount` int(11) DEFAULT NULL,
   `YCount` int(11) DEFAULT NULL,
   `AccountID` varchar(25) COLLATE utf8_unicode_ci DEFAULT NULL,
   PRIMARY KEY (`RunID`),
   KEY `acct-lastupdate` (`AccountID`,`LastUpdate`),
   KEY `acct-lastupdate-counts` (`AccountID`,`LastUpdate`,`XCount`,`YCount`)
   ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;



CREATE TABLE `detail` (
  `DetailID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `LastUpdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `RunID` int(10) unsigned DEFAULT NULL,
  `TestID` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,
  `ResultCode` int(11) DEFAULT NULL,
   PRIMARY KEY (`DetailID`),
  KEY `detail_runid` (`RunID`),
  KEY `detail_testid` (`TestID`),
  KEY `detail_runid_testid_result` (`RunID`,`TestID`,`ResultCode`)
  ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Here is the EXPLAIN output of my query:

EXPLAIN select
      testid as 'TestID',
      sum(case when resultcode = 1 then 1 else 0 end) as Category1,
      sum(case when resultcode = 2 then 1 else 0 end) as Category2,
      sum(case when resultcode = 0 then 1 else 0 end) as Category3
      from detail d, summary s
      where s.accountid = 'xyz'
        and s.lastupdate >= '2014-05-26 00:00:00'
        and s.lastupdate < '2014-07-27 00:00:00'
        and s.runid = d.runid
        and s.runid <= 9999999999
      GROUP BY testid;

 1  SIMPLE  s   ref PRIMARY,acct-lastupdate,acct-lastupdate-counts  acct-lastupdate 78  const   2   Using where; Using index; Using temporary; Using filesort
 1  SIMPLE  d   ref detail_runid,detail_runid_testid_result detail_runid    5   db_9_edeb6.s.RunID  1   (null)

If I remove the GROUP BY then the EXPLAIN says Using where; Using index with no temporary or file sort and the query runs in 2 seconds instead of 6 seconds.

Having these results grouped by the Test ID is mandatory. Also the Test ID values are arbitrary and not known in advance, so there would be no way to write the query with subqueries against hardcoded known test IDs.

Is it possible to define other indexes that may stop the temporary and file sort? If not, is there a more creative way to rewrite this query that would be more efficient and perhaps resolve that?

Note that after the GROUP BY my query really has some HAVING and ORDER BY conditions (specifically it goes ... GROUP BY testid having Category1 OR Category2 OR Category3 order by Category1 desc, Category2 desc;" - however I left this out of the examples here because I get the same performance and EXPLAIN output with or without that expanded clause and I wanted to keep the sample as simple as possible. I mention it here because in case you have a creative way to rewrite the query if you can please include that it would be good.

As noted, there is an SQL fiddle here http://sqlfiddle.com/#!9/edeb6/1 that demonstrates the issue (so you can see the EXPLAIN output and experiment).

Thank you!

If it's an option, then try to add the "accountid" field to the "detail" table. Then you dont need to join the summary table for this query. Remove the "summary" table from your query and point the "s" alias to "d". Then EXPLAIN shows only using where. But I don't know if it is significantly faster than yours.

And the statements "sum(case when resultcode = 1 then 1 else 0 end)" you can write shorter like this "sum(resultcode=1) as Category1, sum(reusltcode=2) as Category2 ..."

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM