简体   繁体   English

BETWEEN查询性能不佳

[英]Poor performance with BETWEEN query

I'm trying to find exam results for individual people between multiple periods using this query: 我正在尝试使用此查询在多个时段之间查找个人的考试结果:

SELECT * FROM RESULTS AS R, Define_Times AS T 
WHERE R.PERSONID = T.PERSONID AND ( 
(R.DATE BETWEEN T.Previous_Month_Start AND T.Previous_Month_End) OR 
(R.DATE BETWEEN T.Next_Month_Start AND T.Next_Month_End) OR 
(R.DATE BETWEEN T.Six_Month_Start AND T.Six_Month_End) OR 
(R.DATE BETWEEN T.One_Year_Start AND T.One_Year_End) OR 
(R.DATE BETWEEN T.Two_Year_Start AND T.Two_Year_End) OR 
(R.DATE BETWEEN T.Three_Year_Start AND T.Three_Year_End) OR 
(R.DATE BETWEEN T.Four_Year_Start AND T.Four_Year_End) )

Previous/Next/One_Year etc. is different for each person. 上一个/下一个/ One_Year等对每个人来说都不同。

Explain gives: 说明给出:

| id | select_type | table | type | possible_keys | key  | key_len | ref             | rows  | Extra       |
|  1 | SIMPLE      | T     | ALL  | PEOPLE        | NULL | NULL    | NULL            | 75775 |             |
|  1 | SIMPLE      | R     | ref  | IDX3,IDX2     | IDX3 | 5       | T.PERSONID      |  3550 | Using where |

The Results table has about 300 million rows. 结果表有大约3亿行。 Define_Times has 75,000. Define_Times有75,000。

It's taking AGES. 它正在服用AGES。

I see that the 1st type is ALL, which is bad. 我看到第一种类型是ALL,这很糟糕。 But if it's so bad, why is it not using the index on PERSONID (called PEOPLE) it identified as a possible? 但如果它如此糟糕,为什么它不使用PERSONID上的索引(称为PEOPLE),它被识别为可能? What can I do to improve this? 我该怎么做才能改善这一点?

I also can't see it using an index for date - there's one on R.DATE. 我也无法使用日期索引看到它 - 在R.DATE有一个。 (It's the first in the sequence of 5 on the index called IDX2.) (这是索引中名为IDX2的序列中的第一个。)

Sorry for any typos - my keyboard is broken, and thanks in advance. 对不起打字错误 - 我的键盘坏了,提前谢谢。

The problem is all the conditions you have ORed together. 问题在于您将ORed组合在一起的所有条件。

If possible, restructure your database so that Define_Time has only four columns: 如果可能,重构数据库,以便Define_Time只有四列:

 CREATE TABLE Define_Times (
    PersonID INTEGER,
    PeriodType SomeType,
    StartDate DATE,
    EndDate DATE )

Then, each person gets 7 records (or more, if there are more periods you're not searching for in your example) in which PeriodType indicates what period the dates specify (you might use text values like PM, NM, SM, 1Y, 2Y, 3Y, 4Y or you might use integer values pointing to a description in another table). 然后,每个人获得7条记录(或者更多,如果你的例子中有更多的时期没有搜索),其中PeriodType表示日期指定的时间段(你可以使用文本值,如PM,NM,SM,1Y, 2Y,3Y,4Y或者您可以使用指向另一个表中的描述的整数值)。

Then, rewrite your query like this: 然后,像这样重写您的查询:

SELECT * FROM RESULTS AS R, Define_Times AS T 
WHERE R.PERSONID = T.PERSONID 
   AND R.DATE BETWEEN T.StartDate AND T.EndDate
   AND T.PeriodType IN (PM,NM,SM,1Y,2Y,3Y,4Y)

This query is at least optimizable . 此查询至少是可优化的

This query will produce one record per matched period for each person. 此查询将为每个人的每个匹配时段生成一条记录。 If your periods do not overlap, that's fine (there will only ever be one matching record). 如果你的期间不重叠,那很好(只有一个匹配的记录)。 If your periods do overlap and you only want one record per result set you'll need to do some additional work with DISTINCT or GROUP BY by aggregate the records in the result set. 如果您的句点重叠并且您只需要每个结果集一条记录,则需要通过聚合结果集中的记录来对DISTINCT或GROUP BY执行一些额外的工作。

Also, note that if you don't have any additional periods in the Define_Times table then you can remove the AND T.PeriodType part of the WHERE clause. 另请注意,如果Define_Times表中没有任何其他句点,则可以删除WHERE子句的AND T.PeriodType部分。

As a comparison, can you run this equivalent query 作为比较,您可以运行此等效查询

SELECT * FROM Define_Times AS T 
INNER JOIN RESULTS AS R on
(R.PERSONID = T.PERSONID and 
  ( 
  (R.DATE BETWEEN T.Previous_Month_Start AND T.Previous_Month_End) OR 
  (R.DATE BETWEEN T.Next_Month_Start AND T.Next_Month_End) OR 
  (R.DATE BETWEEN T.Six_Month_Start AND T.Six_Month_End) OR 
  (R.DATE BETWEEN T.One_Year_Start AND T.One_Year_End) OR 
  (R.DATE BETWEEN T.Two_Year_Start AND T.Two_Year_End) OR 
  (R.DATE BETWEEN T.Three_Year_Start AND T.Three_Year_End) OR 
  (R.DATE BETWEEN T.Four_Year_Start AND T.Four_Year_End) 
  ) 
)

I've seen the optimizer work much better at times in this form. 我已经看到优化器在这种形式下有时会更好地工作。

Also, since you OR all of the date between expressions, it pretty much has no way to use a date index, since any of the date ranges can satisfy the where clause. 此外,由于您或表达式之间的所有日期,它几乎无法使用日期索引,因为任何日期范围都可以满足where子句。

EDIT -- ADDED 编辑 - 增加

If you don't want to run the query, at least try comparing the estimated execution plans 如果您不想运行查询,请至少尝试比较估计的执行计划

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM