简体   繁体   中英

Why NonClustered index scan faster than Clustered Index scan?

As I know, heap tables are tables without clustered index and has no physical order. I have a heap table "scan" with 120k rows and I am using this select:

SELECT id FROM scan

If I create a non-clustered index for the column "id", I get 223 physical reads . If I remove the non-clustered index and alter the table to make "id" my primary key (and so my clustered index), I get 515 physical reads .

If the clustered index table is something like this picture:

在此输入图像描述

Why Clustered Index Scans workw like the table scan? (or worse in case of retrieving all rows). Why it is not using the "clustered index table" that has less blocks and already has the ID that I need?

SQL Server indices are b-trees. A non-clustered index just contains the indexed columns, with the leaf nodes of the b-tree being pointers to the approprate data page. A clustered index is different: its leaf nodes are the data page itself and the clustered index's b-tree becomes the backing store for the table itself; the heap ceases to exist for the table.

Your non-clustered index contains a single, presumably integer column. It's a small, compact index to start with. Your query select id from scan has a covering index : the query can be satisfied just by examining the index, which is what is happening. If, however, your query included columns not in the index, assuming the optimizer elected to use the non-clustered index, an additional lookup would be required to fetch the data pages required, either from the clustering index or from the heap.

To understand what's going on, you need to examine the execution plan selected by the optimizer:

A clustered index generally is about as big as the same data in a heap would be (assuming the same page fullness). It should use just a little more reads than a heap would use because of additional B-tree levels.

A CI cannot be smaller than a heap would be. I don't see why you would think that. Most of the size of a partition (be it a heap or a tree) is in the data.

Note, that less physical reads does not necessarily translate to a query being faster. Random IO can be 100x slower than sequential IO.

When to use Clustered Index-

Query Considerations:
1) Return a range of values by using operators such as BETWEEN, >, >=, <, and <= 2) Return large result sets
3) Use JOIN clauses; typically these are foreign key columns
4) Use ORDER BY, or GROUP BY clauses. An index on the columns specified in the ORDER BY or GROUP BY clause may remove the need for the Database Engine to sort the data, because the rows are already sorted. This improves query performance.

Column Considerations : Consider columns that have one or more of the following attributes: 1) Are unique or contain many distinct values 2) Defined as IDENTITY because the column is guaranteed to be unique within the table 3) Used frequently to sort the data retrieved from a table

Clustered indexes are not a good choice for the following attributes: 1) Columns that undergo frequent changes 2) Wide keys

When to use Nonclustered Index-

Query Considerations:
1) Use JOIN or GROUP BY clauses. Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.
2) Queries that do not return large result sets
3) Contain columns frequently involved in search conditions of a query, such as WHERE clause, that return exact matches

Column Considerations :
Consider columns that have one or more of the following attributes:
1) Cover the query. For more information, see Index with Included Columns
2) Lots of distinct values, such as a combination of last name and first name, if a clustered index is used for other columns
3) Used frequently to sort the data retrieved from a table

Database Considerations:
1) Databases or tables with low update requirements, but large volumes of data can benefit from many nonclustered indexes to improve query performance.
2) Online Transaction Processing applications and databases that contain heavily updated tables should avoid over-indexing. Additionally, indexes should be narrow, that is, with as few columns as possible.

Try running

DBCC DROPCLEANBUFFERS

Before the queries...

If you really want to compare them. Physical reads don't mean the same as logical reads when optimizing a query

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM