I have an SQL query that takes lots of time to evaluate because it is operating on a very big dataset. When trying to improve the execution time I found out the following:
When executing the following query the MySQL server takes a lot of time (up to 100secs)
SELECT some_data
FROM table
INNER JOIN anothertable
ON ( table.value =
anothertable.value )
WHERE ( table.parent = 56521
AND table.date >=
'2016-10-19 08:37:45.606947' )
ORDER BY table.date DESC
LIMIT 1
So I guessed that it's the sorting part of the query that takes so much execution time and I manually removed to sorting to see the differences in the execution:
SELECT some_data
FROM table
INNER JOIN anothertable
ON ( table.value =
anothertable.value )
WHERE ( table.parent = 56521
AND table.date >=
'2016-10-19 08:37:45.606947' )
LIMIT 1
The query above takes 0.45 secs and leads to an empty query set.
I came to the conclusion that my query orders the WHOLE data set before evaluating the WHERE-Clause. How should I form the query in order to prevent that behaviour? Why does this behaviour show up?
These are the EXPLAIN Tables for the slow and the fast query:
Slow
+----+-------------+-------+------------+--------+------------------------------------------+------------------+---------+------------------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+------------------------------------------+------------------+---------+------------------------------+------+----------+-------------+
| 1 | SIMPLE | A | NULL | index | PRIMARY,D4b797d14e515242e7251754c57b7701 | date | 5 | NULL | 1325 | 0.08 | Using where |
| 1 | SIMPLE | B | NULL | eq_ref | PRIMARY | PRIMARY | 4 | value | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+------------------------------------------+------------------+---------+------------------------------+------+----------+-------------+
Fast:
+----+-------------+-------+------------+--------+------------------------------------------+----------------------------------+---------+------------------------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+------------------------------------------+----------------------------------+---------+------------------------------+------+----------+-------+
| 1 | SIMPLE | A | NULL | ref | PRIMARY,D4b797d14e515242e7251754c57b7701 | D4b797d14e515242e7251754c57b7701 | 4 | const | 5175 | 100.00 | NULL |
| 1 | SIMPLE | B | NULL | eq_ref | PRIMARY | PRIMARY | 4 | value | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+------------------------------------------+----------------------------------+---------+------------------------------+------+----------+-------+
MySQL uses the index on date
for your first query. It can partially evaluate the where
-condition ( table.date >= '2016-10-19 08:37:45.606947'
), and if that fits, it will read parent
from your table (which is relatively slow) to see if it fits as well. It can stop as soon as it finds a result (because of the order by
and limit 1
).
Your second query uses the index on parent
(that is the index with the long name), looks for rows that fits, then reads the date
-part from your table and check if it fits too. It has to continue until it has checked all rows with the correct parent
-value (that it finds using the index), and all rows it found have to undergo a filesort, and the newest one will be returned.
(I ommitted that MySQL will have to check/execute the join
too, but that is the same in both queries).
You obviously have a lot more rows that fit your date
-condition than your parent
-condition, so it has to do more relatively slow table lookups, which will take longer.
In this case. Depending on your data, it could actually happen that the first row checked via your index on date
already fulfills the parent
-condition, and could stop right there. If it would use the index on parent
, MySQL would be forced to check all rows with the parent
-value and then do a filesort. MySQL decided on the basis of some statistical data, it was worth the risk. Well, it chose wrong.
You can do the following:
optimize table `table`
(the second table
is your tablename) to update your statistics. This helps sometimes, but usually doesn't (because the statistical data is very limited). ... FROM table force index (D4b797d14e515242e7251754c57b7701) inner join ...
) table(parent, date)
should (not counting potential effects of the join
) give you an even faster result than your unordered query, and MySQL will use it on its own.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.