简体   繁体   中英

SQL Query becomes incredibly slow when ordering empty set

I have an SQL query that takes lots of time to evaluate because it is operating on a very big dataset. When trying to improve the execution time I found out the following:

When executing the following query the MySQL server takes a lot of time (up to 100secs)

SELECT some_data 
FROM   table 
       INNER JOIN anothertable 
               ON ( table.value = 
                               anothertable.value ) 
WHERE  ( table.parent = 56521 
         AND table.date >= 
             '2016-10-19 08:37:45.606947' ) 
ORDER  BY table.date DESC 
LIMIT  1

So I guessed that it's the sorting part of the query that takes so much execution time and I manually removed to sorting to see the differences in the execution:

SELECT some_data 
FROM   table 
       INNER JOIN anothertable 
               ON ( table.value = 
                               anothertable.value ) 
WHERE  ( table.parent = 56521 
         AND table.date >= 
             '2016-10-19 08:37:45.606947' ) 
LIMIT  1

The query above takes 0.45 secs and leads to an empty query set.

I came to the conclusion that my query orders the WHOLE data set before evaluating the WHERE-Clause. How should I form the query in order to prevent that behaviour? Why does this behaviour show up?

These are the EXPLAIN Tables for the slow and the fast query:

Slow
+----+-------------+-------+------------+--------+------------------------------------------+------------------+---------+------------------------------+------+----------+-------------+
    | id | select_type | table | partitions | type   | possible_keys                            | key              | key_len | ref                          | rows | filtered | Extra       |
    +----+-------------+-------+------------+--------+------------------------------------------+------------------+---------+------------------------------+------+----------+-------------+
    |  1 | SIMPLE      | A     | NULL       | index  | PRIMARY,D4b797d14e515242e7251754c57b7701 | date             | 5       | NULL                         | 1325 |     0.08 | Using where |
    |  1 | SIMPLE      | B     | NULL       | eq_ref | PRIMARY                                  | PRIMARY          | 4       | value                        |    1 |   100.00 | NULL        |
    +----+-------------+-------+------------+--------+------------------------------------------+------------------+---------+------------------------------+------+----------+-------------+

Fast:
     +----+-------------+-------+------------+--------+------------------------------------------+----------------------------------+---------+------------------------------+------+----------+-------+
    | id | select_type | table | partitions | type   | possible_keys                            | key                              | key_len | ref                          | rows | filtered | Extra |
    +----+-------------+-------+------------+--------+------------------------------------------+----------------------------------+---------+------------------------------+------+----------+-------+
    |  1 | SIMPLE      | A     | NULL       | ref    | PRIMARY,D4b797d14e515242e7251754c57b7701 | D4b797d14e515242e7251754c57b7701 | 4       | const                        | 5175 |   100.00 | NULL  |
    |  1 | SIMPLE      | B     | NULL       | eq_ref | PRIMARY                                  | PRIMARY                          | 4       | value                        |    1 |   100.00 | NULL  |
    +----+-------------+-------+------------+--------+------------------------------------------+----------------------------------+---------+------------------------------+------+----------+-------+

MySQL uses the index on date for your first query. It can partially evaluate the where -condition ( table.date >= '2016-10-19 08:37:45.606947' ), and if that fits, it will read parent from your table (which is relatively slow) to see if it fits as well. It can stop as soon as it finds a result (because of the order by and limit 1 ).

Your second query uses the index on parent (that is the index with the long name), looks for rows that fits, then reads the date -part from your table and check if it fits too. It has to continue until it has checked all rows with the correct parent -value (that it finds using the index), and all rows it found have to undergo a filesort, and the newest one will be returned.

(I ommitted that MySQL will have to check/execute the join too, but that is the same in both queries).

You obviously have a lot more rows that fit your date -condition than your parent -condition, so it has to do more relatively slow table lookups, which will take longer.

In this case. Depending on your data, it could actually happen that the first row checked via your index on date already fulfills the parent -condition, and could stop right there. If it would use the index on parent , MySQL would be forced to check all rows with the parent -value and then do a filesort. MySQL decided on the basis of some statistical data, it was worth the risk. Well, it chose wrong.

You can do the following:

  • optimize table `table` (the second table is your tablename) to update your statistics. This helps sometimes, but usually doesn't (because the statistical data is very limited).
  • force MySQL to use the index you know is better ( ... FROM table force index (D4b797d14e515242e7251754c57b7701) inner join ... )
  • add the perfect index for your query: a composite index table(parent, date) should (not counting potential effects of the join ) give you an even faster result than your unordered query, and MySQL will use it on its own.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM