简体   繁体   English

使用GROUP BY优化查询以删除“使用临时”; 使用文件排序

[英]Optimizing query with GROUP BY to remove Using Temporary; Using Filesort

I am using mySQL 5.6.13.2 and have a query that involves 150,000 rows in a parent table with over 1M rows in a child table. 我正在使用mySQL 5.6.13.2,并有一个查询,该查询涉及父表中的150,000行和子表中的1M以上的行。 The query takes 2 seconds if I remove the GROUP BY (just as a test) and over 6 seconds if I have the GROUP BY, which is needed. 如果我删除GROUP BY(仅作为测试),查询将花费2秒,如果我拥有GROUP BY,则查询将花费6秒以上。

I've read other posts about how to remove using temporary; 我读过其他有关如何使用临时删除的文章。 using filesort but these do not address the issue. 使用文件排序,但是这些不能解决问题。 I'm hoping to get some help here please. 我希望在这里能得到一些帮助。

A SQL fiddle that demonstrates all this is available here: http://sqlfiddle.com/#!9/edeb6/1 可以在此处找到展示所有这些的SQL提琴: http : //sqlfiddle.com/#!9/edeb6/1

CREATE TABLE `summary` (
   `RunID` int(10) unsigned NOT NULL AUTO_INCREMENT,
   `LastUpdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
   `FileName` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
   `XCount` int(11) DEFAULT NULL,
   `YCount` int(11) DEFAULT NULL,
   `AccountID` varchar(25) COLLATE utf8_unicode_ci DEFAULT NULL,
   PRIMARY KEY (`RunID`),
   KEY `acct-lastupdate` (`AccountID`,`LastUpdate`),
   KEY `acct-lastupdate-counts` (`AccountID`,`LastUpdate`,`XCount`,`YCount`)
   ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;



CREATE TABLE `detail` (
  `DetailID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `LastUpdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `RunID` int(10) unsigned DEFAULT NULL,
  `TestID` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,
  `ResultCode` int(11) DEFAULT NULL,
   PRIMARY KEY (`DetailID`),
  KEY `detail_runid` (`RunID`),
  KEY `detail_testid` (`TestID`),
  KEY `detail_runid_testid_result` (`RunID`,`TestID`,`ResultCode`)
  ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Here is the EXPLAIN output of my query: 这是我的查询的EXPLAIN输出:

EXPLAIN select
      testid as 'TestID',
      sum(case when resultcode = 1 then 1 else 0 end) as Category1,
      sum(case when resultcode = 2 then 1 else 0 end) as Category2,
      sum(case when resultcode = 0 then 1 else 0 end) as Category3
      from detail d, summary s
      where s.accountid = 'xyz'
        and s.lastupdate >= '2014-05-26 00:00:00'
        and s.lastupdate < '2014-07-27 00:00:00'
        and s.runid = d.runid
        and s.runid <= 9999999999
      GROUP BY testid;

 1  SIMPLE  s   ref PRIMARY,acct-lastupdate,acct-lastupdate-counts  acct-lastupdate 78  const   2   Using where; Using index; Using temporary; Using filesort
 1  SIMPLE  d   ref detail_runid,detail_runid_testid_result detail_runid    5   db_9_edeb6.s.RunID  1   (null)

If I remove the GROUP BY then the EXPLAIN says Using where; 如果我删除GROUP BY,则说明说“在哪里使用”; Using index with no temporary or file sort and the query runs in 2 seconds instead of 6 seconds. 使用没有临时或文件排序的索引,查询将在2秒而不是6秒内运行。

Having these results grouped by the Test ID is mandatory. 必须将这些结果按测试ID分组。 Also the Test ID values are arbitrary and not known in advance, so there would be no way to write the query with subqueries against hardcoded known test IDs. 而且,测试ID值是任意的,并且事先未知,因此将无法用带有针对硬编码的已知测试ID的子查询来编写查询。

Is it possible to define other indexes that may stop the temporary and file sort? 是否可以定义其他索引来停止临时和文件排序? If not, is there a more creative way to rewrite this query that would be more efficient and perhaps resolve that? 如果不是,是否有更富创造性的方式来重写此查询,从而提高效率并可能解决该问题?

Note that after the GROUP BY my query really has some HAVING and ORDER BY conditions (specifically it goes ... GROUP BY testid having Category1 OR Category2 OR Category3 order by Category1 desc, Category2 desc;" - however I left this out of the examples here because I get the same performance and EXPLAIN output with or without that expanded clause and I wanted to keep the sample as simple as possible. I mention it here because in case you have a creative way to rewrite the query if you can please include that it would be good. 请注意,在我的查询的GROUP BY确实具有一些HAVING和ORDER BY条件之后(具体来说,它会... GROUP BY testid具有Category1 OR Category2 OR Category3按Category1 desc,Category 2 desc的顺序;“-但是,我在示例中省略了它在这里,因为无论使用或不使用扩展子句,我都能获得相同的性能和EXPLAIN输出,并且我想使示例尽可能简单。我在这里提到它是因为如果您有一种创造性的方式来重写查询(如果可以的话),请这将是很好的。

As noted, there is an SQL fiddle here http://sqlfiddle.com/#!9/edeb6/1 that demonstrates the issue (so you can see the EXPLAIN output and experiment). 如前所述,这里有一个SQL提琴http://sqlfiddle.com/#!9/edeb6/1演示了此问题(因此您可以看到EXPLAIN输出和实验)。

Thank you! 谢谢!

If it's an option, then try to add the "accountid" field to the "detail" table. 如果可以选择,请尝试将“ accountid”字段添加到“ detail”表中。 Then you dont need to join the summary table for this query. 然后,您无需加入此查询的摘要表。 Remove the "summary" table from your query and point the "s" alias to "d". 从查询中删除“摘要”表,并将“ s”别名指向“ d”。 Then EXPLAIN shows only using where. 然后EXPLAIN仅使用where显示。 But I don't know if it is significantly faster than yours. 但我不知道它是否比您的速度快得多。

And the statements "sum(case when resultcode = 1 then 1 else 0 end)" you can write shorter like this "sum(resultcode=1) as Category1, sum(reusltcode=2) as Category2 ..." 语句“ sum(case,结果代码= 1,然后1,否则0,结束)”可以这样写:“ sum(resultcode = 1)作为Category1,sum(reusltcode = 2)作为Category2 ...”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM