简体   繁体   English

SQL Server为什么索引不与OR一起使用

[英]SQL Server why is index not used with OR

I've been studying indexes and trying to understand how they work and how I can use them to boost performance, but I'm missing something. 我一直在研究索引并试图了解它们的工作原理以及如何使用它们来提高性能,但我遗漏了一些东西。

I have the following table: 我有下表:

Person :

| Id | Name | Email | Phone |
| 1  | John |  E1   |  P1   |
| 2  | Max  |  E2   |  P2   |

I'm trying to find the best way to index the columns Email and Phone considering that the queries will (most of the time) be of the form 我正在尝试找到索引EmailPhone列的最佳方法,因为查询将(大部分时间)是表格的形式

[1] SELECT * FROM Person WHERE Email = '...' OR Phone = '...'
[2] SELECT * FROM Person WHERE Email = ...
[3] SELECT * FROM Person WHERE Phone = ...

I thought the best approach would be to create a single index using both columns: 我认为最好的方法是使用两列创建单个索引:

CREATE NONCLUSTERED INDEX [IX_EmailPhone]
ON [dbo].[Person]([Email], [PhoneNumber]);

However, with the index above, only the query [2] benefits from an index seek, the others use index scan. 但是,使用上面的索引,只有查询[2]受益于索引查找,其他查询[2]使用索引扫描。

I also tried to create multiple index: one with both columns, one for email, and one for email. 我还尝试创建多个索引:一个包含两列,一个用于电子邮件,一个用于电子邮件。 In this case, [2] and [3] use seek, but [1] continues to use scan. 在这种情况下,[2]和[3]使用seek,但[1]继续使用scan。

Why can't the database use index with an or? 为什么数据库不能使用索引或? What would be the best indexing approach for this table considering the queries? 考虑到查询,该表的最佳索引方法是什么?

Create a separate index for each column. 为每列创建单独的索引。
By using hints we can force the optimizer to use/not use the indexes, so you can check the execution plan, get a feeling of the performance involved and understand the meaning of each path. 通过使用提示,我们可以强制优化器使用/不使用索引,因此您可以检查执行计划,了解所涉及的性能并了解每个路径的含义。

Go through my demo and consider the work involved in each path for the following scenarios - 浏览我的演示并考虑以下场景中每条路径所涉及的工作 -

  1. Only few rows satisfy the condition j=123. 只有少数行满足条件j = 123。
    Only few rows satisfy the condition k=456. 只有少数行满足条件k = 456。

  2. Most of the rows satisfy the condition j=123. 大多数行满足条件j = 123。
    Most of the rows satisfy the condition k=456. 大多数行满足条件k = 456。

  3. Only few rows satisfy the condition j=123. 只有少数行满足条件j = 123。
    Most of the rows satisfy the condition k=456. 大多数行满足条件k = 456。

Try to think what path you would have chosen for each scenario. 试着想一下你为每个场景选择的路径。
Please feel free to ask questions. 请随时提问。


Demo 演示

;with t(n) as (select 0 union all select n+1 from t where n < 999)

select      1+t0.n+1000*t1.n                                as i
           ,floor(rand(cast (newid() as varbinary))*1000)   as j
           ,floor(rand(cast (newid() as varbinary))*1000)   as k 

into        t

from        t t0,t t1 

option       (maxrecursion 0)
;

create index t_j on t (j);
create index t_k on t (k);

update statistics t (t_j)
update statistics t (t_k)

Scan 扫描

select      *
from        t (forcescan)
where       j = 123
        or  k = 456
  • This is straightforward. 这很简单。

在此输入图像描述

Seek 寻求

select      *
from        t (forceseek)
where       j = 123
        or  k = 456
  • "Index Seek" : Each index is being seeked for the relevant values (123 and 456) “索引寻求” :正在寻找每个指数的相关值(123和456)
  • "Merge Join" : The results (row IDs) are being concatenated (as in UNION ALL) “合并连接” :结果(行ID)正在连接(如在UNION ALL中)
  • "Stream Aggregate" : Duplicate row IDs are being eliminated “Stream Aggregate” :正在删除重复的行ID
  • "Rid Lookup" + "Nested Loops" : The row IDs are being used to retrieve the rows from the table (t) “Rid Lookup” + “嵌套循环” :行ID用于从表中检索行(t)

在此输入图像描述

Use two separate indexes, one on (email) and one on (phone, email) . 使用两个单独的索引,一个在(email) ,一个在(phone, email)

The OR is rather difficult. OR非常困难。 If your conditions were connected by AND rather than OR , then your index would be used for the first query (but not the third, because phone is not the first key in the index). 如果您的条件通过AND而不是OR连接,那么您的索引将用于第一个查询(但不是第三个查询,因为phone不是索引中的第一个键)。

You can write the query as: 您可以将查询编写为:

SELECT *
FROM Person 
WHERE Email = '...' 
UNION ALL
SELECT *
FROM Person 
WHERE Email <> '...' AND Phone = '...';

SQL Server should use the appropriate index for each subquery. SQL Server应为每个子查询使用适当的索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM