I don't understand mysql's EXPLAIN output for the following two queries.
In the first query mysql has to select 1238264 records first:
explain select
count(distinct utc.id)
from
user_to_company utc
inner join
users u
on utc.user_id=u.id
where
u.is_removed=false
order by
utc.user_id asc limit 20;
+----+-------------+--------+------+----------------------------+---------+---------+---------------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+----------------------------+---------+---------+---------------------------------+---------+-------------+
| 1 | SIMPLE | u | ALL | PRIMARY | NULL | NULL | NULL | 1238264 | Using where |
| 1 | SIMPLE | utc | ref | user_id,FKF513E0271C2D1677 | user_id | 8 | u.id | 1 | Using index
In the second query, a GROUP BY
was added which makes mysql to select only 20 records:
explain select
count(distinct utc.id)
from
user_to_company utc
inner join
users u
on utc.user_id=u.id
where
u.is_removed=false
group by
utc.user_id
order by
utc.user_id asc limit 20;
+----+-------------+--------+--------+----------------------------+--------------------+---------+-------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+--------+----------------------------+--------------------+---------+-------------------------+------+-------------+
| 1 | SIMPLE | utc | index | user_id,FKF513E0271C2D1677 | FKF513E0271C2D1677 | 8 | NULL | 20 | Using index |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 8 | utc.user_id | 1 | Using where |
+----+-------------+--------+--------+----------------------------+--------------------+---------+-------------------------+------+-------------+
For more info, there are 1333194 records in the users table and 1327768 records in user_to_company table.
How does adding the GROUP BY
make mysql select only 20 records in the first pass?
The first query has to read all the data to find all the values of utc.id
. It returns only one row, which is a summary for the whole table. So, it has to generate all the data.
The second query is producing a separate total for each utc.user_id
. You have a limit
clause and an index on utc.user_id
. MySQL is, apparently, smart enough to recognize that it can go to the index to get the first 20 values of utc.user_id
. It uses these to generate the counts.
I am surprised that MySQL is smart enough to do this (although the logic is documented pretty well here ). But it makes perfect sense that the second query can be optimized this way where the first one cannot be.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.