Looking at the various answers for ORDER BY with CASE like this one , I see that what I am forced to do in this legacy application is likely an expert method; however, it is too slow when the rows are less than trivial (rows of 100,000 or more cause page loads of 10 seconds).
Please note that the original query seeks to solve an apparently common problem where the query analyst needs dates that are empty sorted counter to how they would normally be sorted. In this case, datefirstprinted
is to be descending, but all records that are not printed should be populated to the top of the list.
The Original Query solves this, but the point of the question is to avoid the filesort
performance hit that comes with the derived column notprintedyet
.
Original Query
SELECT SQL_NO_CACHE
id, daterun, datefirstprinted,
case datefirstprinted when "0000-00-00 00:00:00" then 1 else 0 end as notprintedyet
FROM
patientrecords
WHERE
dateuploaded <> '0000-00-00 00:00:00'
ORDER BY
notprintedyet desc, /* ordered via alias */
datefirstprinted desc
LIMIT 10;
time 1.52s
I found that not sorting on the alias notprintedyet
saves a bit:
Slightly Faster Query
SELECT SQL_NO_CACHE
id, daterun, datefirstprinted,
case datefirstprinted when "0000-00-00 00:00:00" then 1 else 0 end as notprintedyet
FROM
patientrecords
WHERE
dateuploaded <> '0000-00-00 00:00:00'
ORDER BY
datefirstprinted = "0000-00-00 00:00:00" desc, /* directly ordered */
datefirstprinted
LIMIT 10;
time 1.37s
Optimal Speed, but missing required sorting of empty dates first
SELECT SQL_NO_CACHE
id, daterun, datefirstprinted,
case datefirstprinted when "0000-00-00 00:00:00" then 1 else 0 end as notprintedyet
FROM
patientrecords
WHERE
dateuploaded <> '0000-00-00 00:00:00'
ORDER BY
datefirstprinted /* not ordered properly */
LIMIT 10;
time 0.48s
I tried using a view
create view notprinted_patientrecords as (
SELECT id, daterun, datefirstprinted, case datefirstprinted when "0000-00-00 00:00:00" then 1 else 0 end notprintedyet
FROM patientrecords
WHERE dateuploaded <> '0000-00-00 00:00:00'
);
unfortunately when i run explain
explain select * from notprinted_patientrecords order by notprintedyet desc limit 10;
it shows that i am still using filesort
and takes 1.51s aka no savings at all
Would it be faster if datefirstprinted default is NULL?
maybe, but in this legacy app that could do more harm than the 5 seconds extra in page load time
What else might we try? Stored procedures? Functions?
UPDATES
As suggested @strawberry - ORDER BY CASE
...
ORDER BY
case datefirstprinted when "0000-00-00 00:00:00" then 1 else 0 end, datefirstprinted
LIMIT 10;
time 1.52s
as requested by @e4c5, the explain
output:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: patientrecords
type: range
possible_keys: dateuploaded,uploads_report
key: dateuploaded
key_len: 5
ref: NULL
rows: 299095
Extra: Using index condition; Using filesort
except for not ordered properly which has the following variance
rows: 10
Extra: Using where
create table statement
*************************** 1. row ***************************
Table: patientrecords
Create Table: CREATE TABLE `patientrecords` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`datecreated` datetime NOT NULL,
`dateuploaded` datetime NOT NULL,
`daterun` datetime NOT NULL,
`datebilled` datetime NOT NULL,
`datefirstprinted` datetime NOT NULL,
`datelastprinted` datetime NOT NULL,
`client` varchar(5) NOT NULL,
PRIMARY KEY (`id`),
KEY `dateuploaded` (`dateuploaded`),
KEY `daterun` (`daterun`),
KEY `uploads_report` (`dateuploaded`,`client`),
KEY `datefirstprinted` (`datefirstprinted`),
KEY `datelastprinted` (`datelastprinted`)
)
Looking at your table, the first thing to note is that the following index is redundant
KEY `dateuploaded` (`dateuploaded`),
it's role can be fullfilled by this one
KEY `uploads_report` (`dateuploaded`,`client`),
So let's drop the dateuploaded
key. It's not clear whether you actually use the client column in any queries. If you don't, I do believe changing your index as follows will give you a big speed up
KEY `uploads_report` (`dateuploaded`,`datefirstprinted`,`client`),
This is because mysql can only use one index per table. Since the index on the dateuploaded column is being used in the where clause, the index for the datefirstprinted
cannot be used. But if you combine the two column into the same index it can be used in both the sort and the where.
After you have made the above index, this one could probably be dropped:
KEY `datefirstprinted` (`datefirstprinted`),
Having fewer indexes will make your inserts and updates faster.
Following ideas learned on concatenated indexes thanks to @e4c5, I tried adding a key on the two columns (column used in where
and column used in case
based order
clause):
alter table
patientrecords
add index
printedvsuploaded (datefirstprinted, dateuploaded);
This initially had no effect since mysql continued to use the index dateuploaded
.
However adding force index
reduces the query time:
SELECT SQL_NO_CACHE
id, daterun, datefirstprinted
FROM
patientrecords
FORCE INDEX (printedvsuploaded)
WHERE
dateuploaded <> '0000-00-00 00:00:00'
ORDER BY
case when datefirstprinted = "0000-00-00 00:00:00" then 1 else 0 end desc,
datefirstprinted
LIMIT 10;
time 0.64 seconds
it is worth noting that i agree with @e4c5 that the extra index will eventually cause writes to have a performance hit; i'm counting on other roadmap development to help with the reduction of the index count. for now, implementing this will reduce the 10 second page loads of the larger result sets to the manageable 3 second range and is then the solution that will be implemented.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.