Mysql Sub-Select Problem

Question

I have two queries to get related tags from a mysql database, one works, one does not, my question is: "why?"

Problem: When executing the first query, the mysql server gets 100% cpu usage, and has to be restarted to function again.

Query 1 (does not work):

SELECT tags.*, COUNT(ct.company_id) AS count
FROM company2tag ct, tags
WHERE ct.company_id IN (
    SELECT ct.company_id FROM company2tag ct
    WHERE ct.tag_id = 18
    GROUP BY ct.company_id
    HAVING COUNT(ct.company_id) = 1
)
AND tags.id != 18
AND tags.id = ct.tag_id
GROUP BY ct.tag_id
ORDER BY count DESC
LIMIT 5;

Query 2 (works):

SELECT tags.*, COUNT(ct.company_id) AS count
FROM company2tag ct, tags
WHERE ct.company_id IN (5864, 5870, 6140, 6221, 6268)
    AND tags.id != 18
    AND tags.id = ct.tag_id
GROUP BY ct.tag_id
ORDER BY count DESC
LIMIT 5;

To my understanding the two queries above do completely the same, the only difference is that the first query retrieves its "company_id's" via a subselect.

How can this happen?

Answer 1

First of all, you may be experiencing problems from the first query because you have two tables aliased to ct... one in the outer query, one in the sub-query.

Secondly, you can rewrite the IN as a JOIN:

SELECT tags.*, COUNT(ct.company_id) AS count
FROM company2tag ct
INNER JOIN tags ON tags.id = ct.tag_id
INNER JOIN (
    SELECT company_id FROM company2tag
    WHERE tag_id = 18
    GROUP BY company_id
    HAVING COUNT(company_id) = 1
) ctf ON ct.company_id = ctf.company_id
WHERE tags.id != 18
GROUP BY ct.tag_id
ORDER BY count DESC
LIMIT 5;

Note that I haven't actually tested this.

Answer 2

MySQL is not very good in optimizing IN conditions.

The condition in your first query cannot be easily rewritten as EXISTS , that's why MySQL checks the results for each row.

If you want to select company_id 's that are mentioned more than once in tag 18 , it's better to rewrite this query as such:

SELECT  tags.*, COUNT(company_id) AS count
FROM    company2tag ct
JOIN    tags
ON      tags.id = ct.tag_id
WHERE   ct.tag_id <> 18
        AND NOT EXISTS
        (
        SELECT  NULL
        FROM    company2tag cti
        WHERE   cti.tag_id = 18
                AND cti.company_id = ct.company_id
        LIMIT 1, 1
        )
GROUP BY
        ct.tag_id
ORDER BY
        count DESC

The main idea here is that you don't need to COUNT(*) : it's enough just to check that at least two values exist.

See this article in my blog for the similar problem:

Counting bans

Having the following index:

CREATE INDEX ix_company2tag_tag_company_id ON company2tag (tag_id, company_id)

will greatly improve this query.

Mysql Sub-Select Problem

Question

2 answers

solution1
3 ACCPTED 2009-07-29 14:14:13

solution2
2 2009-07-29 13:55:07

Mysql Sub-Select Problem

Question

2 answers

solution1 3 ACCPTED 2009-07-29 14:14:13

solution2 2 2009-07-29 13:55:07

solution1
3 ACCPTED 2009-07-29 14:14:13

solution2
2 2009-07-29 13:55:07