简体   繁体   English

MySQL使用Using临时排序; 使用文件排序

[英]MySQL sorting with Using temporary; Using filesort

Here is the query I'm trying to launch: 这是我要启动的查询:

SELECT c.creative_id, c.creative_title, c.creative_image_name, c.gravity, c.ad_strength
FROM creatives AS c
INNER JOIN term_relationships AS tr ON c.creative_id = tr.creative_id
WHERE tr.term_id
IN ( 14, 1, 50, 76, 104 )
GROUP BY c.creative_id
HAVING COUNT(tr.term_id ) =5
ORDER BY c.gravity ASC 
LIMIT 30;

Here is what EXPLAIN for this query outputs: 这是此查询的EXPLAIN输出:

在此处输入图片说明

Here is the creatives table structure: 这是creatives表结构:

CREATE TABLE `creatives` (
  `creative_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `scraper_id` bigint(20) unsigned DEFAULT NULL,
  `creative_title` varchar(255) NOT NULL,
  `creative_image_name` varchar(255) DEFAULT NULL,
  `image_attrib` varchar(12) DEFAULT NULL,
  `original_image_name` varchar(255) DEFAULT NULL,
  `creative_subtext` varchar(255) DEFAULT NULL,
  `dest_url` varchar(2083) NOT NULL,
  `lp_url` varchar(2083) NOT NULL,
  `lp_image_name` varchar(255) DEFAULT NULL,
  `lp_image_flag` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `creative_first_seen` date NOT NULL,
  `creative_last_seen` date NOT NULL,
  `daily_ad_count` int(5) unsigned NOT NULL,
  `ad_strength` int(11) unsigned NOT NULL,
  `prev_ad_strength` int(11) unsigned DEFAULT NULL,
  `gravity` int(11) unsigned DEFAULT NULL,
  PRIMARY KEY (`creative_id`),
  KEY `gravity` (`gravity`)
) ENGINE=InnoDB AUTO_INCREMENT=173037591 DEFAULT CHARSET=utf8

I'm concerned about Using temporary; using filesort 我担心Using temporary; using filesort Using temporary; using filesort when launching both with GROUP BY and ORDER BY on another column. 在另一列上同时使用GROUP BYORDER BY启动时,请Using temporary; using filesort If I remove ORDER BY , the temporary and filesort go away and the query runs really fast. 如果删除ORDER BY ,则临时和文件排序将消失,查询运行会非常快。

What I don't understand, why mysql needs temporary table, why can't it first where filter + sort by c.gravity , then group by the resulting table and filter according to HAVING clause. 我不明白的是,为什么mysql需要临时表,为什么不能先在filter +按c.gravity排序,然后将结果表分组并根据HAVING子句进行过滤。 The filtered table will be sorted by c.gravity correctly as the gravity value remains unchanged after the grouping and having filter. 过滤后的表格将按c.gravity正确排序,因为重力值在分组并具有过滤器后保持不变。

What I tried: 我试过的

  1. Selected everything without ORDER BY , wrapped into a subquery and joined again on creatives table - same result, using temporary, filesort and slow 选择没有ORDER BY所有内容,将其包装到子查询中,然后再次加入到creatives表中-使用临时,文件排序和缓慢的结果相同

  2. tried to add FORCE USE INDEX FOR ORDER BY (gravity) and it doesn't change anything. 试图添加FORCE USE INDEX FOR ORDER BY (gravity)并且它没有任何改变。 EXPLAIN and execution time remain the same. EXPLAIN和执行时间保持不变。

UPDATE : the question has been answered by @Rick and it's really much faster with his correlated subquery and not using GROUP BY . 更新 :问题已由@Rick回答,并且使用他的相关子查询并且不使用GROUP BY确实更快。 I'm adding here an EXPLAIN output for the query: 我在这里为查询添加EXPLAIN输出:

在此处输入图片说明

And the output of SHOW CREATE TABLE term_relationships with the newly created index: 以及带有新创建索引的SHOW CREATE TABLE term_relationships的输出:

在此处输入图片说明

And one more question to @Rick: why do we need the outer query with c3 ? @Rick还有一个问题:为什么我们需要用c3进行外部查询? It seems just to join creatives on its own one more just to get the values from other columns and order the records by gravity. 似乎仅仅是再加入一个creatives ,只是为了从其他列中获取值并通过重力对记录进行排序。 However, they are already sorted with the inner query and we can easily add missing columns in c1 making it: 但是,它们已经使用内部查询进行了排序,我们可以轻松地在c1添加缺少的列,从而实现:

SELECT  c1.creative_id,c1.creative_title,c1.creative_image_name,c1.gravity, c1.ad_strength
            FROM  creatives AS c1
            WHERE  
              ( SELECT  COUNT(*)
                    FROM  term_relationships
                    WHERE  c1.creative_id = creative_id
                      AND  term_id IN ( 14, 1, 50, 76, 104 )
              ) = 5 
            ORDER BY  c1.gravity ASC
            LIMIT  30;

Is my understanding correct or am I missing something in your query? 我的理解正确吗,或者我在查询中遗漏了什么?

Temp table and filesort are not the villains, per se. 临时表和文件排序本身不是反派。 It's how bulky they are. 它们是如此庞大。

This may look more complex, but it may be faster: 这可能看起来更复杂,但是可能更快:

SELECT  c3.creative_id,
        c3.creative_title, c3.creative_image_name,
        c3.gravity, c3.ad_strength
    FROM  
      ( SELECT  creative_id
            FROM  creatives AS c1
            WHERE  
              ( SELECT  COUNT(*)
                    FROM  term_relationships
                    WHERE  c1.creative_id = creative_id
                      AND  term_id IN ( 14, 1, 50, 76, 104 )
              ) = 5 
            ORDER BY  c1.gravity ASC
            LIMIT  30
      ) AS c2
    JOIN  creatives c3 USING (creative_id)
    ORDER BY  c3.gravity 

If it happens to use INDEX(gravity) for the inner query, then it will stop after finding 30 rows that have all 5 transactions. 如果碰巧对内部查询使用INDEX(gravity) ,则它将在找到具有全部5个事务的30行后停止。 If it generates a tmp table, it will be only 30 rows -- much better than with your original query. 如果它生成一个tmp表,它将只有30行-比原始查询要好得多。 Note also, that the tmp table will be narrower -- only creative_id will be in it. 另请注意,tmp表将更窄-仅creative_id在其中。 Finally it reaches back into creatives to get the rest of the desired columns. 最后,它返回到creatives以获取其余所需的列。 Finally, there will be another sort, but with only 30 rows. 最后,将有另一种排序方式,但只有30行。

Furthermore, "filesort" is often a very fast sort in RAM, not really a "file" sort. 此外,“文件排序”在RAM中通常是非常快速的排序,而不是真正的“文件”排序。 I'm pretty sure my query will not be on disk. 我很确定我的查询不会在磁盘上。

term_relationships needs this composite index: INDEX(creative_id, term_id) . term_relationships需要以下综合索引: INDEX(creative_id, term_id)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM