I have two queries to get related tags from a mysql database, one works, one does not, my question is: "why?"
Problem: When executing the first query, the mysql server gets 100% cpu usage, and has to be restarted to function again.
Query 1 (does not work):
SELECT tags.*, COUNT(ct.company_id) AS count
FROM company2tag ct, tags
WHERE ct.company_id IN (
SELECT ct.company_id FROM company2tag ct
WHERE ct.tag_id = 18
GROUP BY ct.company_id
HAVING COUNT(ct.company_id) = 1
)
AND tags.id != 18
AND tags.id = ct.tag_id
GROUP BY ct.tag_id
ORDER BY count DESC
LIMIT 5;
Query 2 (works):
SELECT tags.*, COUNT(ct.company_id) AS count
FROM company2tag ct, tags
WHERE ct.company_id IN (5864, 5870, 6140, 6221, 6268)
AND tags.id != 18
AND tags.id = ct.tag_id
GROUP BY ct.tag_id
ORDER BY count DESC
LIMIT 5;
To my understanding the two queries above do completely the same, the only difference is that the first query retrieves its "company_id's" via a subselect.
How can this happen?
First of all, you may be experiencing problems from the first query because you have two tables aliased to ct... one in the outer query, one in the sub-query.
Secondly, you can rewrite the IN as a JOIN:
SELECT tags.*, COUNT(ct.company_id) AS count
FROM company2tag ct
INNER JOIN tags ON tags.id = ct.tag_id
INNER JOIN (
SELECT company_id FROM company2tag
WHERE tag_id = 18
GROUP BY company_id
HAVING COUNT(company_id) = 1
) ctf ON ct.company_id = ctf.company_id
WHERE tags.id != 18
GROUP BY ct.tag_id
ORDER BY count DESC
LIMIT 5;
Note that I haven't actually tested this.
MySQL
is not very good in optimizing IN
conditions.
The condition in your first query cannot be easily rewritten as EXISTS
, that's why MySQL
checks the results for each row.
If you want to select company_id
's that are mentioned more than once in tag 18
, it's better to rewrite this query as such:
SELECT tags.*, COUNT(company_id) AS count
FROM company2tag ct
JOIN tags
ON tags.id = ct.tag_id
WHERE ct.tag_id <> 18
AND NOT EXISTS
(
SELECT NULL
FROM company2tag cti
WHERE cti.tag_id = 18
AND cti.company_id = ct.company_id
LIMIT 1, 1
)
GROUP BY
ct.tag_id
ORDER BY
count DESC
The main idea here is that you don't need to COUNT(*)
: it's enough just to check that at least two values exist.
See this article in my blog for the similar problem:
Having the following index:
CREATE INDEX ix_company2tag_tag_company_id ON company2tag (tag_id, company_id)
will greatly improve this query.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.