How does adding GROUP BY make this query more efficient?

Question

I don't understand mysql's EXPLAIN output for the following two queries.

In the first query mysql has to select 1238264 records first:

explain select
    count(distinct utc.id)
from
    user_to_company utc
inner join
    users u
        on utc.user_id=u.id
where
    u.is_removed=false
order by
    utc.user_id asc limit 20;

+----+-------------+--------+------+----------------------------+---------+---------+---------------------------------+---------+-------------+
| id | select_type | table  | type | possible_keys              | key     | key_len | ref                             | rows    | Extra       |
+----+-------------+--------+------+----------------------------+---------+---------+---------------------------------+---------+-------------+
|  1 | SIMPLE      | u      | ALL  | PRIMARY                    | NULL    | NULL    | NULL                            | 1238264 | Using where |
|  1 | SIMPLE      | utc    | ref  | user_id,FKF513E0271C2D1677 | user_id | 8       | u.id                            |       1 | Using index

In the second query, a GROUP BY was added which makes mysql to select only 20 records:

explain select
    count(distinct utc.id)
from
    user_to_company utc
inner join
    users u
        on utc.user_id=u.id
where
    u.is_removed=false
group by
    utc.user_id
order by
    utc.user_id asc limit 20;

+----+-------------+--------+--------+----------------------------+--------------------+---------+-------------------------+------+-------------+
| id | select_type | table  | type   | possible_keys              | key                | key_len | ref                     | rows | Extra       |
+----+-------------+--------+--------+----------------------------+--------------------+---------+-------------------------+------+-------------+
|  1 | SIMPLE      | utc  | index  | user_id,FKF513E0271C2D1677 | FKF513E0271C2D1677   | 8       | NULL                    |   20 | Using index |
|  1 | SIMPLE      | u    | eq_ref | PRIMARY                    | PRIMARY              | 8       | utc.user_id             |    1 | Using where |
+----+-------------+--------+--------+----------------------------+--------------------+---------+-------------------------+------+-------------+

For more info, there are 1333194 records in the users table and 1327768 records in user_to_company table.

How does adding the GROUP BY make mysql select only 20 records in the first pass?

Answer 1

The first query has to read all the data to find all the values of utc.id . It returns only one row, which is a summary for the whole table. So, it has to generate all the data.

The second query is producing a separate total for each utc.user_id . You have a limit clause and an index on utc.user_id . MySQL is, apparently, smart enough to recognize that it can go to the index to get the first 20 values of utc.user_id . It uses these to generate the counts.

I am surprised that MySQL is smart enough to do this (although the logic is documented pretty well here ). But it makes perfect sense that the second query can be optimized this way where the first one cannot be.

How does adding GROUP BY make this query more efficient?

Question

1 answers

solution1
3 ACCPTED 2013-12-24 00:30:47

How does adding GROUP BY make this query more efficient?

Question

1 answers

solution1 3 ACCPTED 2013-12-24 00:30:47

solution1
3 ACCPTED 2013-12-24 00:30:47