Why does Mysql decide to use an index on column specified in Order By clause although that column is not present in where clause ? This happens when Order By + Limit clause are used together in the query.
Example query:
select col1, col2,col3 from table_name where col1 = 'x' and col3='y' order by colY limit 3;
table_name has 9M records
In the absence of limit clause, mysql uses the index on col1 column which is wayy faster.
Better
select col1, col2,col3
from table_name
where col1 = 'x'
and col3 = 'y'
order by col4
limit 3;
The optimal index is one of these two:
INDEX(col1, col3, col4)
INDEX(col3, col1, col4)
In both, the Optimizer can completely resolve the WHERE
and do the ORDER BY
and even stop after 3 rows due to the LIMIT
.
Best. Even better performance would come from adding col2
to the end of either. This makes it a "covering" index, so all the work can be done in the index's BTree without touching the data's BTree.
Back to your question
If you don't have one of those indexes, the Optimizer is in a quandary, and often picks the wrong of the two likely choices. Let's say you have only
INDEX(col1), INDEX(col4)
Plan A focuses on filtering: Use col1
, but have to sort all the matching rows before peeling off 3. But it might get a million rows and have to sort them.
Plan B avoids sorting: Scan through the index in col4
order. If it is really lucky, the first 3 rows will match the WHERE
clause. If it is really unlucky, it will scan the entire table without finding 3 acceptable rows. But they will be sorted!
The "statistics" are meager, and cannot realistically decide between the two choices.
Either Plan could be really slow.
Similar problems occur with JOINs
with the WHERE
clause filtering on both tables.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.