In making the tag tables for an archive of user-created game maps, the SQL for getting the map ids of maps containing all provided tags is, with ... being the tags and # being the number of tags:
SELECT DISTINCT map_id
FROM `map_tag`
INNER JOIN `tag` USING (tag_id)
WHERE tag IN (...)
GROUP BY map_id HAVING COUNT(DISTINCT tag_id) = #
ORDER BY map_id DESC
/* Affected rows: 0 Found rows: 83,597 Warnings: 0 Duration for 1 query: 0.032 sec. (+ 0.531 sec. network) */
+----+-------------+---------+-------+---------------+---------+---------+-------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+---------+---------+-------+--------+--------------------------+
| 1 | SIMPLE | tag | const | PRIMARY,tag | tag | 767 | const | 1 | Using index |
| 1 | SIMPLE | map_tag | index | NULL | PRIMARY | 8 | NULL | 888729 | Using where; Using index |
+----+-------------+---------+-------+---------------+---------+---------+-------+--------+--------------------------+
I then join the maps themselves and the SQL becomes:
SELECT
`map`.*
FROM (
SELECT DISTINCT map_id
FROM `map_tag`
INNER JOIN `tag` USING (tag_id)
WHERE tag IN (...)
GROUP BY map_id HAVING COUNT(DISTINCT tag_id) = #
ORDER BY map_id DESC
) matching
INNER JOIN `map` USING (map_id)
INNER JOIN `map_tag` USING (map_id)
INNER JOIN `tag` USING (tag_id)
LIMIT 0, 10
/* Affected rows: 0 Found rows: 10 Warnings: 0 Duration for 1 query: 0.297 sec. */
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+--------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 83597 | |
| 1 | PRIMARY | map | eq_ref | PRIMARY | PRIMARY | 4 | matching.map_id | 1 | |
| 1 | PRIMARY | map_tag | ref | PRIMARY | PRIMARY | 4 | matching.map_id | 2 | Using index |
| 1 | PRIMARY | tag | eq_ref | PRIMARY | PRIMARY | 4 | maps.local.map_tag.tag_id | 1 | Using index |
| 2 | DERIVED | tag | const | PRIMARY,tag | tag | 767 | | 1 | Using index |
| 2 | DERIVED | map_tag | index | NULL | PRIMARY | 8 | NULL | 888729 | Using where; Using index |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+--------------------------+
The problem arises now when I want to actually use the tags.
SELECT
`map`.*,
GROUP_CONCAT(`tag`.tag) AS tags
FROM (
SELECT DISTINCT map_id
FROM `map_tag`
INNER JOIN `tag` USING (tag_id)
WHERE tag IN (...)
GROUP BY map_id HAVING COUNT(DISTINCT tag_id) = #
ORDER BY map_id DESC
) matching
INNER JOIN `map` USING (map_id)
INNER JOIN `map_tag` USING (map_id)
INNER JOIN `tag` USING (tag_id)
GROUP BY map_id
LIMIT 0, 10
/* Affected rows: 0 Found rows: 10 Warnings: 0 Duration for 1 query: 47.641 sec. */
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 83597 | Using temporary; Using filesort |
| 1 | PRIMARY | map | eq_ref | PRIMARY | PRIMARY | 4 | matching.map_id | 1 | |
| 1 | PRIMARY | map_tag | ref | PRIMARY | PRIMARY | 4 | matching.map_id | 2 | Using index |
| 1 | PRIMARY | tag | eq_ref | PRIMARY | PRIMARY | 4 | maps.local.map_tag.tag_id | 1 | |
| 2 | DERIVED | tag | const | PRIMARY,tag | tag | 767 | | 1 | Using index |
| 2 | DERIVED | map_tag | index | NULL | PRIMARY | 8 | NULL | 888729 | Using where; Using index |
+----+-------------+------------+--------+---------------+---------+---------+---------------------------+--------+---------------------------------+
A 47 second query, up from 0.3 seconds before the INNER JOIN
to the map
table. The subquery switches to using temporary and filesort, and I have no idea why. I have indexes set up for the map_id
in all the relevant tables, but for some reason it doesn't use them when doing the GROUP BY
. ORDER BY
also causes this behavior.
Is there something I need to do to alter the tables so that the indexes are used? Is there a more efficient way of bringing the map
table in and obtaining all tags, not just the ones matching?
The goal is to have, if there are three maps (this is not indicative of table structure, tags
is the map
to map_tag
to tag
table relationship):
+-------+---------------+
| name | tags |
+-------+---------------+
| map A | aaa, bbb, ccc |
| map B | bbb, ccc, zzz |
| map C | ccc, zzz, yyy |
+-------+---------------+
that if I search for tags "bbb" and "ccc" I get as a result:
+-------+---------------+
| name | tags |
+-------+---------------+
| map A | aaa, bbb, ccc |
| map B | bbb, ccc, zzz |
+-------+---------------+
with all tags belonging to each map, instead of just the ones matched, and that I am able to sort the resulting map
rows by map
columns without MySQL ignoring the indexes:
...
ORDER BY `map`.published DESC
/* Affected rows: 0 Found rows: 10 Warnings: 0 Duration for 1 query: 00:01:35 (+ 0.078 sec. network) */
Not really understanding what you question, nor answers to comments was, HOWEVER... I would try to structure it this way... Your inner query is a join from map_tag and tags table on the qualifying tags, and the group concat of distinct is done there with the having count grouped by the map id. Done... Now you can just join to you map table on those that qualified.
To help the index optimizing, I can suggest the following indexes
table index
map_tag ( map_id, tag_id )
tag ( tag_id, tag )
map ( map_id )
SELECT
m.*,
PreTags.allTags
from
( SELECT
mt.map_id,
GROUP_CONCAT(DISTINCT t.tag ORDER BY t.tag SEPARATOR ',') allTags
FROM
map_tag mt
JOIN `tag` t
ON mt.tag_id = t.tag_id
group by
mt.map_id
having
SUM( case when t.tag in (...) then 1 else 0 end ) > 1
order by
mt.map_id DESC ) PreTags
JOIN map m
ON PreTags.map_id = m.map_id
limit
0, 10
This way, the inner query does the group concat for you AND the having so you don't have to reapply it in the outside when getting the final map entries... and since the inner is grouped by the map_id, you would not have duplicates coming from the inner query.
HERE IS ANOTHER OPTION I would be curious of its performance.
SELECT
m.*,
FullTags.allTags
from
( SELECT
Just10.map_id,
GROUP_CONCAT(DISTINCT t.tag ORDER BY t.tag SEPARATOR ',') allTags
from
( SELECT mt.map_id
FROM map_tag mt
where mt.tag_id in ( select t.tag_id
from `tag` t
where t.tag in (...) )
group by mt.map_id
having COUNT(*) > 1
order by mt.map_id DESC
limit 0, 10 ) Just10
JOIN map_tag mt2
ON Just10.map_id = mt2.map_id
JOIN `tag` t
ON mt2.tag_id = t.tag_id
group by
Just10.map_id ) FullTags
JOIN map m
ON FullTags.map_id = m.map_id
The inner-most query gets just the max of 10 entries for those that have more than one tag matching that you are looking for an applies the order by. Then, only for those 10 does it go back and get the group_concat() -- again, this is just for max of 10 records, then joins finally to get the rest of the map data.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.