[英]MySQL sorting with Using temporary; Using filesort
Here is the query I'm trying to launch: 这是我要启动的查询:
SELECT c.creative_id, c.creative_title, c.creative_image_name, c.gravity, c.ad_strength
FROM creatives AS c
INNER JOIN term_relationships AS tr ON c.creative_id = tr.creative_id
WHERE tr.term_id
IN ( 14, 1, 50, 76, 104 )
GROUP BY c.creative_id
HAVING COUNT(tr.term_id ) =5
ORDER BY c.gravity ASC
LIMIT 30;
Here is what EXPLAIN
for this query outputs: 这是此查询的
EXPLAIN
输出:
Here is the creatives
table structure: 这是
creatives
表结构:
CREATE TABLE `creatives` (
`creative_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`scraper_id` bigint(20) unsigned DEFAULT NULL,
`creative_title` varchar(255) NOT NULL,
`creative_image_name` varchar(255) DEFAULT NULL,
`image_attrib` varchar(12) DEFAULT NULL,
`original_image_name` varchar(255) DEFAULT NULL,
`creative_subtext` varchar(255) DEFAULT NULL,
`dest_url` varchar(2083) NOT NULL,
`lp_url` varchar(2083) NOT NULL,
`lp_image_name` varchar(255) DEFAULT NULL,
`lp_image_flag` tinyint(1) unsigned NOT NULL DEFAULT '0',
`creative_first_seen` date NOT NULL,
`creative_last_seen` date NOT NULL,
`daily_ad_count` int(5) unsigned NOT NULL,
`ad_strength` int(11) unsigned NOT NULL,
`prev_ad_strength` int(11) unsigned DEFAULT NULL,
`gravity` int(11) unsigned DEFAULT NULL,
PRIMARY KEY (`creative_id`),
KEY `gravity` (`gravity`)
) ENGINE=InnoDB AUTO_INCREMENT=173037591 DEFAULT CHARSET=utf8
I'm concerned about Using temporary; using filesort
我担心
Using temporary; using filesort
Using temporary; using filesort
when launching both with GROUP BY
and ORDER BY
on another column. 在另一列上同时使用
GROUP BY
和ORDER BY
启动时,请Using temporary; using filesort
。 If I remove ORDER BY
, the temporary and filesort go away and the query runs really fast. 如果删除
ORDER BY
,则临时和文件排序将消失,查询运行会非常快。
What I don't understand, why mysql needs temporary table, why can't it first where filter + sort by c.gravity
, then group by the resulting table and filter according to HAVING
clause. 我不明白的是,为什么mysql需要临时表,为什么不能先在filter +按
c.gravity
排序,然后将结果表分组并根据HAVING
子句进行过滤。 The filtered table will be sorted by c.gravity
correctly as the gravity value remains unchanged after the grouping and having filter. 过滤后的表格将按
c.gravity
正确排序,因为重力值在分组并具有过滤器后保持不变。
What I tried: 我试过的
Selected everything without ORDER BY
, wrapped into a subquery and joined again on creatives
table - same result, using temporary, filesort and slow 选择没有
ORDER BY
所有内容,将其包装到子查询中,然后再次加入到creatives
表中-使用临时,文件排序和缓慢的结果相同
tried to add FORCE USE INDEX FOR ORDER BY (gravity)
and it doesn't change anything. 试图添加
FORCE USE INDEX FOR ORDER BY (gravity)
并且它没有任何改变。 EXPLAIN
and execution time remain the same. EXPLAIN
和执行时间保持不变。
UPDATE : the question has been answered by @Rick and it's really much faster with his correlated subquery and not using GROUP BY
. 更新 :问题已由@Rick回答,并且使用他的相关子查询并且不使用
GROUP BY
确实更快。 I'm adding here an EXPLAIN
output for the query: 我在这里为查询添加
EXPLAIN
输出:
And the output of SHOW CREATE TABLE term_relationships
with the newly created index: 以及带有新创建索引的
SHOW CREATE TABLE term_relationships
的输出:
And one more question to @Rick: why do we need the outer query with c3
? @Rick还有一个问题:为什么我们需要用
c3
进行外部查询? It seems just to join creatives
on its own one more just to get the values from other columns and order the records by gravity. 似乎仅仅是再加入一个
creatives
,只是为了从其他列中获取值并通过重力对记录进行排序。 However, they are already sorted with the inner query and we can easily add missing columns in c1
making it: 但是,它们已经使用内部查询进行了排序,我们可以轻松地在
c1
添加缺少的列,从而实现:
SELECT c1.creative_id,c1.creative_title,c1.creative_image_name,c1.gravity, c1.ad_strength
FROM creatives AS c1
WHERE
( SELECT COUNT(*)
FROM term_relationships
WHERE c1.creative_id = creative_id
AND term_id IN ( 14, 1, 50, 76, 104 )
) = 5
ORDER BY c1.gravity ASC
LIMIT 30;
Is my understanding correct or am I missing something in your query? 我的理解正确吗,或者我在查询中遗漏了什么?
Temp table and filesort are not the villains, per se. 临时表和文件排序本身不是反派。 It's how bulky they are.
它们是如此庞大。
This may look more complex, but it may be faster: 这可能看起来更复杂,但是可能更快:
SELECT c3.creative_id,
c3.creative_title, c3.creative_image_name,
c3.gravity, c3.ad_strength
FROM
( SELECT creative_id
FROM creatives AS c1
WHERE
( SELECT COUNT(*)
FROM term_relationships
WHERE c1.creative_id = creative_id
AND term_id IN ( 14, 1, 50, 76, 104 )
) = 5
ORDER BY c1.gravity ASC
LIMIT 30
) AS c2
JOIN creatives c3 USING (creative_id)
ORDER BY c3.gravity
If it happens to use INDEX(gravity)
for the inner query, then it will stop after finding 30 rows that have all 5 transactions. 如果碰巧对内部查询使用
INDEX(gravity)
,则它将在找到具有全部5个事务的30行后停止。 If it generates a tmp table, it will be only 30 rows -- much better than with your original query. 如果它生成一个tmp表,它将只有30行-比原始查询要好得多。 Note also, that the tmp table will be narrower -- only
creative_id
will be in it. 另请注意,tmp表将更窄-仅
creative_id
在其中。 Finally it reaches back into creatives
to get the rest of the desired columns. 最后,它返回到
creatives
以获取其余所需的列。 Finally, there will be another sort, but with only 30 rows. 最后,将有另一种排序方式,但只有30行。
Furthermore, "filesort" is often a very fast sort in RAM, not really a "file" sort. 此外,“文件排序”在RAM中通常是非常快速的排序,而不是真正的“文件”排序。 I'm pretty sure my query will not be on disk.
我很确定我的查询不会在磁盘上。
term_relationships
needs this composite index: INDEX(creative_id, term_id)
. term_relationships
需要以下综合索引: INDEX(creative_id, term_id)
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.