简体   繁体   中英

SQL Server why is index not used with OR

I've been studying indexes and trying to understand how they work and how I can use them to boost performance, but I'm missing something.

I have the following table:

Person :

| Id | Name | Email | Phone |
| 1  | John |  E1   |  P1   |
| 2  | Max  |  E2   |  P2   |

I'm trying to find the best way to index the columns Email and Phone considering that the queries will (most of the time) be of the form

[1] SELECT * FROM Person WHERE Email = '...' OR Phone = '...'
[2] SELECT * FROM Person WHERE Email = ...
[3] SELECT * FROM Person WHERE Phone = ...

I thought the best approach would be to create a single index using both columns:

CREATE NONCLUSTERED INDEX [IX_EmailPhone]
ON [dbo].[Person]([Email], [PhoneNumber]);

However, with the index above, only the query [2] benefits from an index seek, the others use index scan.

I also tried to create multiple index: one with both columns, one for email, and one for email. In this case, [2] and [3] use seek, but [1] continues to use scan.

Why can't the database use index with an or? What would be the best indexing approach for this table considering the queries?

Create a separate index for each column.
By using hints we can force the optimizer to use/not use the indexes, so you can check the execution plan, get a feeling of the performance involved and understand the meaning of each path.

Go through my demo and consider the work involved in each path for the following scenarios -

  1. Only few rows satisfy the condition j=123.
    Only few rows satisfy the condition k=456.

  2. Most of the rows satisfy the condition j=123.
    Most of the rows satisfy the condition k=456.

  3. Only few rows satisfy the condition j=123.
    Most of the rows satisfy the condition k=456.

Try to think what path you would have chosen for each scenario.
Please feel free to ask questions.


Demo

;with t(n) as (select 0 union all select n+1 from t where n < 999)

select      1+t0.n+1000*t1.n                                as i
           ,floor(rand(cast (newid() as varbinary))*1000)   as j
           ,floor(rand(cast (newid() as varbinary))*1000)   as k 

into        t

from        t t0,t t1 

option       (maxrecursion 0)
;

create index t_j on t (j);
create index t_k on t (k);

update statistics t (t_j)
update statistics t (t_k)

Scan

select      *
from        t (forcescan)
where       j = 123
        or  k = 456
  • This is straightforward.

在此输入图像描述

Seek

select      *
from        t (forceseek)
where       j = 123
        or  k = 456
  • "Index Seek" : Each index is being seeked for the relevant values (123 and 456)
  • "Merge Join" : The results (row IDs) are being concatenated (as in UNION ALL)
  • "Stream Aggregate" : Duplicate row IDs are being eliminated
  • "Rid Lookup" + "Nested Loops" : The row IDs are being used to retrieve the rows from the table (t)

在此输入图像描述

Use two separate indexes, one on (email) and one on (phone, email) .

The OR is rather difficult. If your conditions were connected by AND rather than OR , then your index would be used for the first query (but not the third, because phone is not the first key in the index).

You can write the query as:

SELECT *
FROM Person 
WHERE Email = '...' 
UNION ALL
SELECT *
FROM Person 
WHERE Email <> '...' AND Phone = '...';

SQL Server should use the appropriate index for each subquery.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM