简体   繁体   中英

SQL Server making use of Non Clustered Index despite having a Clustered Index

I have two indexes on a table called Shopper .

Clustered index:

CREATE CLUSTERED INDEX [CI_EMail_ShopperNumID] 
ON [dbo].[Shopper] ([EMail] ASC, [ShopperNumID] ASC)

Non Clustered Index

CREATE NONCLUSTERED INDEX [nci_wi_Shopper_D8E9A1BB0660D0838F923BB8587C7115] 
ON [dbo].[Shopper] ([EMail] ASC)
INCLUDE ([DateCreated], [FirstName], [LastLoginDate], [LastName],
    [MaxEmailVolume], [ShopperNumID], [ShopperSourceCD], [ShopperSourceOther]) 

I run a very simple SELECT :

SELECT ShopperNumID
FROM shopper
WHERE Email = '87.kl@abcxyz.com'

On analyzing the Execution Plan, I notice that the non-clustered index is being used:

在此处输入图片说明

Now, I drop the non-clustered index:

DROP INDEX IF EXISTS [nci_wi_Shopper_D8E9A1BB0660D0838F923BB8587C7115] 
ON [dbo].[Shopper]
GO

and re-run my select to notice that the clustered index is (finally) being used

正在使用聚集索引 .

Can someone please explain why the (bulky) non-clustered index is being used by the optimization engine, instead of the (preferred) clustered index?

Microsoft SQL Server 2016 (RTM-GDR) (KB3194716) - 13.0.1722.0 (X64)
Developer Edition (64-bit) on Windows 10 Pro 6.3 (Build 14393:)

UPDATE: Based on the inputs received, to evaluate this further, I created another non clustered index on the table, very similar to the already existing clustered index.

CREATE NONCLUSTERED INDEX [NCI_EMail_ShopperNumID] 
ON [dbo].[Shopper] ([EMail] ASC, [ShopperNumID] ASC)

Currently, the table has 3 indexes that can support my SELECT :

  1. CLUSTERED INDEX [CI_EMail_ShopperNumID]
  2. NONCLUSTERED INDEX [nci_wi_Shopper_D8E9A1BB0660D0838F923BB8587C7115]
  3. NONCLUSTERED INDEX [NCI_EMail_ShopperNumID]

Now, When I run the same SELECT :

SELECT ShopperNumID
FROM shopper
WHERE Email = '87.kl@abcxyz.com'

and analyze the Execution Plan, I notice that the newly created non-clustered index is being used: 在此处输入图片说明

Seems like the optimizer is adamant about using a Non Clustered Index, no matter what!

The non-clustered index is being used because it is optimised for looking up a row based on Email .

You might think that it is bulky, but the fact that it is keyed on Email makes it ideal for your query, even if it includes every column in the table.

What you may not realise is that the clustered index is just as bulky, because it implicitly includes every field in the table. So in the worst case scenario (don't design something like this) you have both indexes keyed on Email and both contain every column. The optimiser could choose to use either, really.

If you use this script it can show you how much space is actually used by the nonclustered and clustered indexes:

SELECT o.NAME AS TableOrViewName,
        i.name As IndexName,
        i.type_desc As IndexType,
        i.index_id As IndexOrdinal,
        s.Name AS SchemaName,
        p.rows AS RowCounts,
        p.data_compression_desc As CompressionType,
        SUM(a.total_pages) * 8 / 1024.0 AS ObjectSpaceMB, 
        SUM(a.used_pages) * 8 / 1024.0 AS UsedSpaceMB
      FROM sys.objects As o
      LEFT JOIN sys.indexes i ON o.OBJECT_ID = i.object_id
      JOIN sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
      JOIN sys.allocation_units a ON p.partition_id = a.container_id
      LEFT JOIN sys.schemas s ON o.schema_id = s.schema_id
      WHERE o.NAME NOT LIKE 'dt%' 
        AND o.is_ms_shipped = 0
        AND i.OBJECT_ID > 255 
      GROUP BY o.Name, 
        i.name, 
        i.type_desc, 
        i.index_id,
        s.Name, 
        p.data_compression_desc,
        p.Rows;

Basically, it is six-of-one or half-a-dozen of the other.

Both your clustered index and non-clustered index have b-tree structures for the email address. So, either can find the matching email address(es) very quickly.

How, then, does the optimizer choose which to fetch? Well, in both cases, if there is one record then one page (either a data page or index leaf page) is fetched. Perhaps it is arbitrary that the non-clustered index is chosen.

However, the optimizer does not know how many records an email address matches. Hence, it must make a decision based on the number of email matches. If the non-clustered index only had the two columns, then this would be a no-brainer. The index page would contain more records (because a "record" is only two columns), so the records matching the email would be on fewer pages.

In your case, though, the non-clustered index is a covering index with all columns. Perhaps more of these fit on an index page than a data page (there is some overhead on data pages and it might be more than the overhead on an index page).

So, where have we gotten? The basic operations are searching through the b-tree (which is the same for both index types) and then reading the records that match. Under most circumstances, the two index structures will be pretty equivalent in these operations. SQL Server might have a slight preference for the non-clustered index because more records fit on an index page than on a data page (this is a guess).

From MSDN: Clustered and Nonclustered Indexes Described : Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.

The nonclustered index is covering (including) the additional specified columns, so that it does not need to go back to the table when referencing any of the included columns. See MSDN:Create Indexes with Included Columns . Effectively, the nonclustered index is like creating a new table with the included columns, sorted by the index columns.

With respect to your query, the clustered and nonclustered indexes are very near identical, the only difference being that the clustered index is additionally sorted by [ShopperNumID]. Perhaps the query optimizer is picking the nonclustered index because it is a nominally better fit. In this case, the better fit does not necessarily mean better performance.

Assuming that the clustered and nonclustered indexes are both located on the same storage medium, your nonclustered index takes up space but provides no added performance value.

First, compliments on looking at the query plan to see what index is getting used. The query optimizer tries to minimize IO, but it can do some funny things. Generally speaking, non-clustered indexes are smaller than clustered indexes. If the optimizer can see that the non clustered index can answer the query using fewer reads, this is the answer to your question. The exception would be if the non-clustered index included all of the columns from the table. I suspect this might be the point of your question.

While there are definitely use cases where it makes sense to use a string in your clustered index, remember that the clustered index is always included in each non-clustered index. You want your clustered index to be small and selective if not unique, it looks like ShopperNumbId would meet this criteria, but we don't have your full table. Consider dropping the email address from your clustered index.

If you're application needs to look up records based on the email address creating the smallest full covering index for the columns you need is going to give you the best performance, which is what nci_wi_Shopper_D8E9A1BB0660D0838F923BB8587C7115 appears to be.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM