简体   繁体   中英

mysql ORDER BY with CASE - too slow, faster way?

Looking at the various answers for ORDER BY with CASE like this one , I see that what I am forced to do in this legacy application is likely an expert method; however, it is too slow when the rows are less than trivial (rows of 100,000 or more cause page loads of 10 seconds).

Please note that the original query seeks to solve an apparently common problem where the query analyst needs dates that are empty sorted counter to how they would normally be sorted. In this case, datefirstprinted is to be descending, but all records that are not printed should be populated to the top of the list.

The Original Query solves this, but the point of the question is to avoid the filesort performance hit that comes with the derived column notprintedyet .

Original Query

SELECT SQL_NO_CACHE
  id, daterun, datefirstprinted,
  case datefirstprinted when "0000-00-00 00:00:00" then 1 else 0 end as notprintedyet
FROM
  patientrecords
WHERE
  dateuploaded <> '0000-00-00 00:00:00'
ORDER BY
  notprintedyet desc,                                 /* ordered via alias */
  datefirstprinted desc
LIMIT 10;

time 1.52s


I found that not sorting on the alias notprintedyet saves a bit:

Slightly Faster Query

SELECT SQL_NO_CACHE
  id, daterun, datefirstprinted,
  case datefirstprinted when "0000-00-00 00:00:00" then 1 else 0 end as notprintedyet
FROM
  patientrecords
WHERE
  dateuploaded <> '0000-00-00 00:00:00'
ORDER BY
  datefirstprinted = "0000-00-00 00:00:00" desc,      /* directly ordered */
  datefirstprinted
LIMIT 10;

time 1.37s


Optimal Speed, but missing required sorting of empty dates first

SELECT SQL_NO_CACHE
  id, daterun, datefirstprinted,
  case datefirstprinted when "0000-00-00 00:00:00" then 1 else 0 end as notprintedyet
FROM
  patientrecords
WHERE
  dateuploaded <> '0000-00-00 00:00:00'
ORDER BY                        
  datefirstprinted                                     /* not ordered properly */
LIMIT 10;

time 0.48s


I tried using a view

create view notprinted_patientrecords as (
   SELECT id, daterun, datefirstprinted, case datefirstprinted when "0000-00-00 00:00:00" then 1 else 0 end notprintedyet
   FROM patientrecords
   WHERE dateuploaded <> '0000-00-00 00:00:00'
);

unfortunately when i run explain

 explain select * from notprinted_patientrecords order by notprintedyet desc limit 10;

it shows that i am still using filesort and takes 1.51s aka no savings at all


Would it be faster if datefirstprinted default is NULL?

maybe, but in this legacy app that could do more harm than the 5 seconds extra in page load time


What else might we try? Stored procedures? Functions?


UPDATES

As suggested @strawberry - ORDER BY CASE

...
ORDER BY                        
  case datefirstprinted when "0000-00-00 00:00:00" then 1 else 0 end, datefirstprinted
LIMIT 10;

time 1.52s


as requested by @e4c5, the explain output:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: patientrecords
         type: range
possible_keys: dateuploaded,uploads_report
          key: dateuploaded
      key_len: 5
          ref: NULL
         rows: 299095
        Extra: Using index condition; Using filesort

except for not ordered properly which has the following variance

        rows: 10
        Extra: Using where

create table statement

*************************** 1. row ***************************
Table: patientrecords
Create Table: CREATE TABLE `patientrecords` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `datecreated` datetime NOT NULL,
  `dateuploaded` datetime NOT NULL,
  `daterun` datetime NOT NULL,
  `datebilled` datetime NOT NULL,
  `datefirstprinted` datetime NOT NULL,
  `datelastprinted` datetime NOT NULL,
  `client` varchar(5) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `dateuploaded` (`dateuploaded`),
  KEY `daterun` (`daterun`),
  KEY `uploads_report` (`dateuploaded`,`client`),
  KEY `datefirstprinted` (`datefirstprinted`),
  KEY `datelastprinted` (`datelastprinted`)
)

Looking at your table, the first thing to note is that the following index is redundant

KEY `dateuploaded` (`dateuploaded`),

it's role can be fullfilled by this one

KEY `uploads_report` (`dateuploaded`,`client`),

So let's drop the dateuploaded key. It's not clear whether you actually use the client column in any queries. If you don't, I do believe changing your index as follows will give you a big speed up

KEY `uploads_report` (`dateuploaded`,`datefirstprinted`,`client`),

This is because mysql can only use one index per table. Since the index on the dateuploaded column is being used in the where clause, the index for the datefirstprinted cannot be used. But if you combine the two column into the same index it can be used in both the sort and the where.

After you have made the above index, this one could probably be dropped:

KEY `datefirstprinted` (`datefirstprinted`),

Having fewer indexes will make your inserts and updates faster.

Following ideas learned on concatenated indexes thanks to @e4c5, I tried adding a key on the two columns (column used in where and column used in case based order clause):

alter table
  patientrecords
add index
  printedvsuploaded (datefirstprinted, dateuploaded);

This initially had no effect since mysql continued to use the index dateuploaded .

However adding force index reduces the query time:

SELECT SQL_NO_CACHE
  id, daterun, datefirstprinted
FROM
  patientrecords
FORCE INDEX (printedvsuploaded)
WHERE
  dateuploaded <> '0000-00-00 00:00:00'
ORDER BY
  case when datefirstprinted = "0000-00-00 00:00:00" then 1 else 0 end desc,
  datefirstprinted
LIMIT 10;

time 0.64 seconds

it is worth noting that i agree with @e4c5 that the extra index will eventually cause writes to have a performance hit; i'm counting on other roadmap development to help with the reduction of the index count. for now, implementing this will reduce the 10 second page loads of the larger result sets to the manageable 3 second range and is then the solution that will be implemented.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM