简体   繁体   中英

MySQL performance, subquery using temporary filesort when query uses order by/group by

In making the tag tables for an archive of user-created game maps, the SQL for getting the map ids of maps containing all provided tags is, with ... being the tags and # being the number of tags:

SELECT DISTINCT map_id 
FROM `map_tag` 
INNER JOIN `tag` USING (tag_id) 
WHERE tag IN (...) 
GROUP BY map_id HAVING COUNT(DISTINCT tag_id) = #
ORDER BY map_id DESC

/* Affected rows: 0  Found rows: 83,597  Warnings: 0  Duration for 1 query: 0.032 sec. (+ 0.531 sec. network) */

+----+-------------+---------+-------+---------------+---------+---------+-------+--------+--------------------------+
| id | select_type | table   | type  | possible_keys | key     | key_len | ref   | rows   | Extra                    |
+----+-------------+---------+-------+---------------+---------+---------+-------+--------+--------------------------+
|  1 | SIMPLE      | tag     | const | PRIMARY,tag   | tag     | 767     | const |      1 | Using index              |
|  1 | SIMPLE      | map_tag | index | NULL          | PRIMARY | 8       | NULL  | 888729 | Using where; Using index |
+----+-------------+---------+-------+---------------+---------+---------+-------+--------+--------------------------+

I then join the maps themselves and the SQL becomes:

SELECT 
    `map`.*
FROM (
    SELECT DISTINCT map_id 
    FROM `map_tag` 
    INNER JOIN `tag` USING (tag_id) 
    WHERE tag IN (...) 
    GROUP BY map_id HAVING COUNT(DISTINCT tag_id) = #
    ORDER BY map_id DESC
) matching 
INNER JOIN `map` USING (map_id)
INNER JOIN `map_tag` USING (map_id) 
INNER JOIN `tag` USING (tag_id) 
LIMIT 0, 10

/* Affected rows: 0  Found rows: 10  Warnings: 0  Duration for 1 query: 0.297 sec. */

+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+--------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref                       | rows   | Extra                    |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+--------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL          | NULL    | NULL    | NULL                      |  83597 |                          |
|  1 | PRIMARY     | map        | eq_ref | PRIMARY       | PRIMARY | 4       | matching.map_id           |      1 |                          |
|  1 | PRIMARY     | map_tag    | ref    | PRIMARY       | PRIMARY | 4       | matching.map_id           |      2 | Using index              |
|  1 | PRIMARY     | tag        | eq_ref | PRIMARY       | PRIMARY | 4       | maps.local.map_tag.tag_id |      1 | Using index              |
|  2 | DERIVED     | tag        | const  | PRIMARY,tag   | tag     | 767     |                           |      1 | Using index              |
|  2 | DERIVED     | map_tag    | index  | NULL          | PRIMARY | 8       | NULL                      | 888729 | Using where; Using index |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+--------------------------+

The problem arises now when I want to actually use the tags.

SELECT 
    `map`.*,
    GROUP_CONCAT(`tag`.tag) AS tags
FROM (
    SELECT DISTINCT map_id 
    FROM `map_tag` 
    INNER JOIN `tag` USING (tag_id) 
    WHERE tag IN (...) 
    GROUP BY map_id HAVING COUNT(DISTINCT tag_id) = #
    ORDER BY map_id DESC
) matching 
INNER JOIN `map` USING (map_id)
INNER JOIN `map_tag` USING (map_id) 
INNER JOIN `tag` USING (tag_id) 
GROUP BY map_id
LIMIT 0, 10

/* Affected rows: 0  Found rows: 10  Warnings: 0  Duration for 1 query: 47.641 sec. */

+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+---------------------------------+
| id | select_type | table      | type   | possible_keys | key     | key_len | ref                       | rows   | Extra                           |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+---------------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL          | NULL    | NULL    | NULL                      |  83597 | Using temporary; Using filesort |
|  1 | PRIMARY     | map        | eq_ref | PRIMARY       | PRIMARY | 4       | matching.map_id           |      1 |                                 |
|  1 | PRIMARY     | map_tag    | ref    | PRIMARY       | PRIMARY | 4       | matching.map_id           |      2 | Using index                     |
|  1 | PRIMARY     | tag        | eq_ref | PRIMARY       | PRIMARY | 4       | maps.local.map_tag.tag_id |      1 |                                 |
|  2 | DERIVED     | tag        | const  | PRIMARY,tag   | tag     | 767     |                           |      1 | Using index                     |
|  2 | DERIVED     | map_tag    | index  | NULL          | PRIMARY | 8       | NULL                      | 888729 | Using where; Using index        |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+---------------------------------+

A 47 second query, up from 0.3 seconds before the INNER JOIN to the map table. The subquery switches to using temporary and filesort, and I have no idea why. I have indexes set up for the map_id in all the relevant tables, but for some reason it doesn't use them when doing the GROUP BY . ORDER BY also causes this behavior.

Is there something I need to do to alter the tables so that the indexes are used? Is there a more efficient way of bringing the map table in and obtaining all tags, not just the ones matching?


The goal is to have, if there are three maps (this is not indicative of table structure, tags is the map to map_tag to tag table relationship):

+-------+---------------+
| name  |     tags      |
+-------+---------------+
| map A | aaa, bbb, ccc |
| map B | bbb, ccc, zzz |
| map C | ccc, zzz, yyy |
+-------+---------------+

that if I search for tags "bbb" and "ccc" I get as a result:

+-------+---------------+
| name  |     tags      |
+-------+---------------+
| map A | aaa, bbb, ccc |
| map B | bbb, ccc, zzz |
+-------+---------------+

with all tags belonging to each map, instead of just the ones matched, and that I am able to sort the resulting map rows by map columns without MySQL ignoring the indexes:

...
ORDER BY `map`.published DESC

/* Affected rows: 0  Found rows: 10  Warnings: 0  Duration for 1 query: 00:01:35 (+ 0.078 sec. network) */

Not really understanding what you question, nor answers to comments was, HOWEVER... I would try to structure it this way... Your inner query is a join from map_tag and tags table on the qualifying tags, and the group concat of distinct is done there with the having count grouped by the map id. Done... Now you can just join to you map table on those that qualified.

To help the index optimizing, I can suggest the following indexes

table       index
map_tag     ( map_id, tag_id )
tag         ( tag_id, tag )
map         ( map_id )

SELECT
      m.*,
      PreTags.allTags
   from
      ( SELECT 
              mt.map_id,
              GROUP_CONCAT(DISTINCT t.tag ORDER BY t.tag SEPARATOR ',') allTags
           FROM 
              map_tag mt
                 JOIN `tag` t
                    ON mt.tag_id = t.tag_id
           group by
              mt.map_id
           having 
              SUM( case when t.tag in (...) then 1 else 0 end ) > 1
           order by
              mt.map_id DESC ) PreTags
         JOIN map m
            ON PreTags.map_id = m.map_id
   limit 
      0, 10

This way, the inner query does the group concat for you AND the having so you don't have to reapply it in the outside when getting the final map entries... and since the inner is grouped by the map_id, you would not have duplicates coming from the inner query.

HERE IS ANOTHER OPTION I would be curious of its performance.

SELECT
      m.*,
      FullTags.allTags
   from 
      ( SELECT
              Just10.map_id,
              GROUP_CONCAT(DISTINCT t.tag ORDER BY t.tag SEPARATOR ',') allTags
           from 
              ( SELECT mt.map_id
                   FROM map_tag mt
                   where mt.tag_id in ( select t.tag_id
                                           from `tag` t
                                           where t.tag in (...) )
                   group by mt.map_id
                   having COUNT(*) > 1
                   order by mt.map_id DESC
                   limit 0, 10 ) Just10
                 JOIN map_tag mt2
                    ON Just10.map_id = mt2.map_id
                    JOIN `tag` t
                       ON mt2.tag_id = t.tag_id
           group by
              Just10.map_id ) FullTags
      JOIN map m
         ON FullTags.map_id = m.map_id

The inner-most query gets just the max of 10 entries for those that have more than one tag matching that you are looking for an applies the order by. Then, only for those 10 does it go back and get the group_concat() -- again, this is just for max of 10 records, then joins finally to get the rest of the map data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM