SQL Server why is index not used with OR

Question

I've been studying indexes and trying to understand how they work and how I can use them to boost performance, but I'm missing something.

I have the following table:

Person :

| Id | Name | Email | Phone |
| 1  | John |  E1   |  P1   |
| 2  | Max  |  E2   |  P2   |

I'm trying to find the best way to index the columns Email and Phone considering that the queries will (most of the time) be of the form

[1] SELECT * FROM Person WHERE Email = '...' OR Phone = '...'
[2] SELECT * FROM Person WHERE Email = ...
[3] SELECT * FROM Person WHERE Phone = ...

I thought the best approach would be to create a single index using both columns:

CREATE NONCLUSTERED INDEX [IX_EmailPhone]
ON [dbo].[Person]([Email], [PhoneNumber]);

However, with the index above, only the query [2] benefits from an index seek, the others use index scan.

I also tried to create multiple index: one with both columns, one for email, and one for email. In this case, [2] and [3] use seek, but [1] continues to use scan.

Why can't the database use index with an or? What would be the best indexing approach for this table considering the queries?

Answer 1

Create a separate index for each column.
By using hints we can force the optimizer to use/not use the indexes, so you can check the execution plan, get a feeling of the performance involved and understand the meaning of each path.

Go through my demo and consider the work involved in each path for the following scenarios -

Only few rows satisfy the condition j=123.
Only few rows satisfy the condition k=456.
Most of the rows satisfy the condition j=123.
Most of the rows satisfy the condition k=456.
Only few rows satisfy the condition j=123.
Most of the rows satisfy the condition k=456.

Try to think what path you would have chosen for each scenario.
Please feel free to ask questions.

Demo

;with t(n) as (select 0 union all select n+1 from t where n < 999)

select      1+t0.n+1000*t1.n                                as i
           ,floor(rand(cast (newid() as varbinary))*1000)   as j
           ,floor(rand(cast (newid() as varbinary))*1000)   as k 

into        t

from        t t0,t t1 

option       (maxrecursion 0)
;

create index t_j on t (j);
create index t_k on t (k);

update statistics t (t_j)
update statistics t (t_k)

Scan

select      *
from        t (forcescan)
where       j = 123
        or  k = 456

This is straightforward.

Seek

select      *
from        t (forceseek)
where       j = 123
        or  k = 456

"Index Seek" : Each index is being seeked for the relevant values (123 and 456)
"Merge Join" : The results (row IDs) are being concatenated (as in UNION ALL)
"Stream Aggregate" : Duplicate row IDs are being eliminated
"Rid Lookup" + "Nested Loops" : The row IDs are being used to retrieve the rows from the table (t)

Answer 2

Use two separate indexes, one on (email) and one on (phone, email) .

The OR is rather difficult. If your conditions were connected by AND rather than OR , then your index would be used for the first query (but not the third, because phone is not the first key in the index).

You can write the query as:

SELECT *
FROM Person 
WHERE Email = '...' 
UNION ALL
SELECT *
FROM Person 
WHERE Email <> '...' AND Phone = '...';

SQL Server should use the appropriate index for each subquery.

SQL Server why is index not used with OR

Question

2 answers

solution1
1 2016-11-29 20:36:00

Scan

Seek

solution2
0 2016-11-29 19:53:40

SQL Server why is index not used with OR

Question

2 answers

solution1 1 2016-11-29 20:36:00

Scan

Seek

solution2 0 2016-11-29 19:53:40

solution1
1 2016-11-29 20:36:00

solution2
0 2016-11-29 19:53:40