简体   繁体   中英

Poor performance with BETWEEN query

I'm trying to find exam results for individual people between multiple periods using this query:

SELECT * FROM RESULTS AS R, Define_Times AS T 
WHERE R.PERSONID = T.PERSONID AND ( 
(R.DATE BETWEEN T.Previous_Month_Start AND T.Previous_Month_End) OR 
(R.DATE BETWEEN T.Next_Month_Start AND T.Next_Month_End) OR 
(R.DATE BETWEEN T.Six_Month_Start AND T.Six_Month_End) OR 
(R.DATE BETWEEN T.One_Year_Start AND T.One_Year_End) OR 
(R.DATE BETWEEN T.Two_Year_Start AND T.Two_Year_End) OR 
(R.DATE BETWEEN T.Three_Year_Start AND T.Three_Year_End) OR 
(R.DATE BETWEEN T.Four_Year_Start AND T.Four_Year_End) )

Previous/Next/One_Year etc. is different for each person.

Explain gives:

| id | select_type | table | type | possible_keys | key  | key_len | ref             | rows  | Extra       |
|  1 | SIMPLE      | T     | ALL  | PEOPLE        | NULL | NULL    | NULL            | 75775 |             |
|  1 | SIMPLE      | R     | ref  | IDX3,IDX2     | IDX3 | 5       | T.PERSONID      |  3550 | Using where |

The Results table has about 300 million rows. Define_Times has 75,000.

It's taking AGES.

I see that the 1st type is ALL, which is bad. But if it's so bad, why is it not using the index on PERSONID (called PEOPLE) it identified as a possible? What can I do to improve this?

I also can't see it using an index for date - there's one on R.DATE. (It's the first in the sequence of 5 on the index called IDX2.)

Sorry for any typos - my keyboard is broken, and thanks in advance.

The problem is all the conditions you have ORed together.

If possible, restructure your database so that Define_Time has only four columns:

 CREATE TABLE Define_Times (
    PersonID INTEGER,
    PeriodType SomeType,
    StartDate DATE,
    EndDate DATE )

Then, each person gets 7 records (or more, if there are more periods you're not searching for in your example) in which PeriodType indicates what period the dates specify (you might use text values like PM, NM, SM, 1Y, 2Y, 3Y, 4Y or you might use integer values pointing to a description in another table).

Then, rewrite your query like this:

SELECT * FROM RESULTS AS R, Define_Times AS T 
WHERE R.PERSONID = T.PERSONID 
   AND R.DATE BETWEEN T.StartDate AND T.EndDate
   AND T.PeriodType IN (PM,NM,SM,1Y,2Y,3Y,4Y)

This query is at least optimizable .

This query will produce one record per matched period for each person. If your periods do not overlap, that's fine (there will only ever be one matching record). If your periods do overlap and you only want one record per result set you'll need to do some additional work with DISTINCT or GROUP BY by aggregate the records in the result set.

Also, note that if you don't have any additional periods in the Define_Times table then you can remove the AND T.PeriodType part of the WHERE clause.

As a comparison, can you run this equivalent query

SELECT * FROM Define_Times AS T 
INNER JOIN RESULTS AS R on
(R.PERSONID = T.PERSONID and 
  ( 
  (R.DATE BETWEEN T.Previous_Month_Start AND T.Previous_Month_End) OR 
  (R.DATE BETWEEN T.Next_Month_Start AND T.Next_Month_End) OR 
  (R.DATE BETWEEN T.Six_Month_Start AND T.Six_Month_End) OR 
  (R.DATE BETWEEN T.One_Year_Start AND T.One_Year_End) OR 
  (R.DATE BETWEEN T.Two_Year_Start AND T.Two_Year_End) OR 
  (R.DATE BETWEEN T.Three_Year_Start AND T.Three_Year_End) OR 
  (R.DATE BETWEEN T.Four_Year_Start AND T.Four_Year_End) 
  ) 
)

I've seen the optimizer work much better at times in this form.

Also, since you OR all of the date between expressions, it pretty much has no way to use a date index, since any of the date ranges can satisfy the where clause.

EDIT -- ADDED

If you don't want to run the query, at least try comparing the estimated execution plans

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM