简体   繁体   中英

MySQL sorting with Using temporary; Using filesort

Here is the query I'm trying to launch:

SELECT c.creative_id, c.creative_title, c.creative_image_name, c.gravity, c.ad_strength
FROM creatives AS c
INNER JOIN term_relationships AS tr ON c.creative_id = tr.creative_id
WHERE tr.term_id
IN ( 14, 1, 50, 76, 104 )
GROUP BY c.creative_id
HAVING COUNT(tr.term_id ) =5
ORDER BY c.gravity ASC 
LIMIT 30;

Here is what EXPLAIN for this query outputs:

在此处输入图片说明

Here is the creatives table structure:

CREATE TABLE `creatives` (
  `creative_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `scraper_id` bigint(20) unsigned DEFAULT NULL,
  `creative_title` varchar(255) NOT NULL,
  `creative_image_name` varchar(255) DEFAULT NULL,
  `image_attrib` varchar(12) DEFAULT NULL,
  `original_image_name` varchar(255) DEFAULT NULL,
  `creative_subtext` varchar(255) DEFAULT NULL,
  `dest_url` varchar(2083) NOT NULL,
  `lp_url` varchar(2083) NOT NULL,
  `lp_image_name` varchar(255) DEFAULT NULL,
  `lp_image_flag` tinyint(1) unsigned NOT NULL DEFAULT '0',
  `creative_first_seen` date NOT NULL,
  `creative_last_seen` date NOT NULL,
  `daily_ad_count` int(5) unsigned NOT NULL,
  `ad_strength` int(11) unsigned NOT NULL,
  `prev_ad_strength` int(11) unsigned DEFAULT NULL,
  `gravity` int(11) unsigned DEFAULT NULL,
  PRIMARY KEY (`creative_id`),
  KEY `gravity` (`gravity`)
) ENGINE=InnoDB AUTO_INCREMENT=173037591 DEFAULT CHARSET=utf8

I'm concerned about Using temporary; using filesort Using temporary; using filesort when launching both with GROUP BY and ORDER BY on another column. If I remove ORDER BY , the temporary and filesort go away and the query runs really fast.

What I don't understand, why mysql needs temporary table, why can't it first where filter + sort by c.gravity , then group by the resulting table and filter according to HAVING clause. The filtered table will be sorted by c.gravity correctly as the gravity value remains unchanged after the grouping and having filter.

What I tried:

  1. Selected everything without ORDER BY , wrapped into a subquery and joined again on creatives table - same result, using temporary, filesort and slow

  2. tried to add FORCE USE INDEX FOR ORDER BY (gravity) and it doesn't change anything. EXPLAIN and execution time remain the same.

UPDATE : the question has been answered by @Rick and it's really much faster with his correlated subquery and not using GROUP BY . I'm adding here an EXPLAIN output for the query:

在此处输入图片说明

And the output of SHOW CREATE TABLE term_relationships with the newly created index:

在此处输入图片说明

And one more question to @Rick: why do we need the outer query with c3 ? It seems just to join creatives on its own one more just to get the values from other columns and order the records by gravity. However, they are already sorted with the inner query and we can easily add missing columns in c1 making it:

SELECT  c1.creative_id,c1.creative_title,c1.creative_image_name,c1.gravity, c1.ad_strength
            FROM  creatives AS c1
            WHERE  
              ( SELECT  COUNT(*)
                    FROM  term_relationships
                    WHERE  c1.creative_id = creative_id
                      AND  term_id IN ( 14, 1, 50, 76, 104 )
              ) = 5 
            ORDER BY  c1.gravity ASC
            LIMIT  30;

Is my understanding correct or am I missing something in your query?

Temp table and filesort are not the villains, per se. It's how bulky they are.

This may look more complex, but it may be faster:

SELECT  c3.creative_id,
        c3.creative_title, c3.creative_image_name,
        c3.gravity, c3.ad_strength
    FROM  
      ( SELECT  creative_id
            FROM  creatives AS c1
            WHERE  
              ( SELECT  COUNT(*)
                    FROM  term_relationships
                    WHERE  c1.creative_id = creative_id
                      AND  term_id IN ( 14, 1, 50, 76, 104 )
              ) = 5 
            ORDER BY  c1.gravity ASC
            LIMIT  30
      ) AS c2
    JOIN  creatives c3 USING (creative_id)
    ORDER BY  c3.gravity 

If it happens to use INDEX(gravity) for the inner query, then it will stop after finding 30 rows that have all 5 transactions. If it generates a tmp table, it will be only 30 rows -- much better than with your original query. Note also, that the tmp table will be narrower -- only creative_id will be in it. Finally it reaches back into creatives to get the rest of the desired columns. Finally, there will be another sort, but with only 30 rows.

Furthermore, "filesort" is often a very fast sort in RAM, not really a "file" sort. I'm pretty sure my query will not be on disk.

term_relationships needs this composite index: INDEX(creative_id, term_id) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM