简体   繁体   English

为什么NonClustered索引扫描比聚簇索引扫描更快?

[英]Why NonClustered index scan faster than Clustered Index scan?

As I know, heap tables are tables without clustered index and has no physical order. 据我所知,堆表是没有聚簇索引的表,没有物理顺序。 I have a heap table "scan" with 120k rows and I am using this select: 我有一个堆栈表“扫描”有120k行,我使用这个选择:

SELECT id FROM scan

If I create a non-clustered index for the column "id", I get 223 physical reads . 如果我为列“id”创建一个非聚集索引,我会得到223个物理读取 If I remove the non-clustered index and alter the table to make "id" my primary key (and so my clustered index), I get 515 physical reads . 如果我删除非聚集索引并更改表以使“id”成为我的主键(以及我的聚簇索引),我将获得515个物理读取

If the clustered index table is something like this picture: 如果聚集索引表是这样的图片:

在此输入图像描述

Why Clustered Index Scans workw like the table scan? 为什么Clustered Index Scans像表扫描一样工作? (or worse in case of retrieving all rows). (或者在检索所有行的情况下更糟)。 Why it is not using the "clustered index table" that has less blocks and already has the ID that I need? 为什么它不使用具有较少块的“聚簇索引表”并且已经具有我需要的ID?

SQL Server indices are b-trees. SQL Server索引是b树。 A non-clustered index just contains the indexed columns, with the leaf nodes of the b-tree being pointers to the approprate data page. 非聚集索引仅包含索引列,b树的叶节点是指向适当数据页的指针。 A clustered index is different: its leaf nodes are the data page itself and the clustered index's b-tree becomes the backing store for the table itself; 聚簇索引是不同的:它的叶节点是数据页本身,聚簇索引的b树成为表本身的后备存储; the heap ceases to exist for the table. 堆不再存在于表中。

Your non-clustered index contains a single, presumably integer column. 您的非聚集索引包含一个可能是整数列。 It's a small, compact index to start with. 这是一个小而紧凑的索引。 Your query select id from scan has a covering index : the query can be satisfied just by examining the index, which is what is happening. select id from scan查询select id from scan具有覆盖索引 :只需检查索引就可以满足查询,这就是正在发生的事情。 If, however, your query included columns not in the index, assuming the optimizer elected to use the non-clustered index, an additional lookup would be required to fetch the data pages required, either from the clustering index or from the heap. 但是,如果您的查询包含不在索引中的列,假设优化程序选择使用非聚集索引,则需要额外的查找来从聚簇索引或堆中获取所需的数据页。

To understand what's going on, you need to examine the execution plan selected by the optimizer: 要了解发生了什么,您需要检查优化器选择的执行计划:

A clustered index generally is about as big as the same data in a heap would be (assuming the same page fullness). 聚簇索引通常与堆中的相同数据一样大(假设页面完整性相同)。 It should use just a little more reads than a heap would use because of additional B-tree levels. 由于额外的B树级别,它应该使用比堆更多的读取。

A CI cannot be smaller than a heap would be. CI不能小于堆。 I don't see why you would think that. 我不明白为什么你会这么想。 Most of the size of a partition (be it a heap or a tree) is in the data. 分区的大部分大小(无论是堆还是树)都在数据中。

Note, that less physical reads does not necessarily translate to a query being faster. 请注意,较少的物理读取不一定转化为查询更快。 Random IO can be 100x slower than sequential IO. 随机IO可能比顺序IO慢100倍。

When to use Clustered Index- 何时使用聚集索引 -

Query Considerations: 查询注意事项:
1) Return a range of values by using operators such as BETWEEN, >, >=, <, and <= 2) Return large result sets 1)使用BETWEEN,>,> =,<和<= 2等运算符返回一系列值。返回大的结果集
3) Use JOIN clauses; 3)使用JOIN子句; typically these are foreign key columns 通常这些是外键列
4) Use ORDER BY, or GROUP BY clauses. 4)使用ORDER BY或GROUP BY子句。 An index on the columns specified in the ORDER BY or GROUP BY clause may remove the need for the Database Engine to sort the data, because the rows are already sorted. ORDER BY或GROUP BY子句中指定的列的索引可能不需要数据库引擎对数据进行排序,因为行已经排序。 This improves query performance. 这提高了查询性能。

Column Considerations : Consider columns that have one or more of the following attributes: 1) Are unique or contain many distinct values 2) Defined as IDENTITY because the column is guaranteed to be unique within the table 3) Used frequently to sort the data retrieved from a table 列注意事项:考虑具有以下一个或多个属性的列:1)是唯一的或包含许多不同的值2)定义为IDENTITY,因为该列在表中保证是唯一的3)经常用于对从中检索的数据进行排序一张桌子

Clustered indexes are not a good choice for the following attributes: 1) Columns that undergo frequent changes 2) Wide keys 对于以下属性,聚簇索引不是一个好的选择:1)经常更改的列2)宽键

When to use Nonclustered Index- 何时使用非聚集索引 -

Query Considerations: 查询注意事项:
1) Use JOIN or GROUP BY clauses. 1)使用JOIN或GROUP BY子句。 Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns. 在连接和分组操作中涉及的列上创建多个非聚簇索引,并在任何外键列上创建聚簇索引。
2) Queries that do not return large result sets 2)不返回大型结果集的查询
3) Contain columns frequently involved in search conditions of a query, such as WHERE clause, that return exact matches 3)包含经常涉及查询的搜索条件的列,例如WHERE子句,返回完全匹配

Column Considerations : 栏目注意事项
Consider columns that have one or more of the following attributes: 考虑具有以下一个或多个属性的列:
1) Cover the query. 1)覆盖查询。 For more information, see Index with Included Columns 有关更多信息,请参阅包含列的索引
2) Lots of distinct values, such as a combination of last name and first name, if a clustered index is used for other columns 2)如果聚集索引用于其他列,则有许多不同的值,例如姓氏和名字的组合
3) Used frequently to sort the data retrieved from a table 3)经常用于对从表中检索的数据进行排序

Database Considerations: 数据库考虑因素:
1) Databases or tables with low update requirements, but large volumes of data can benefit from many nonclustered indexes to improve query performance. 1)具有较低更新要求的数据库或表,但是大量数据可以从许多非聚簇索引中受益,以提高查询性能。
2) Online Transaction Processing applications and databases that contain heavily updated tables should avoid over-indexing. 2)包含大量更新表的联机事务处理应用程序和数据库应避免过度索引。 Additionally, indexes should be narrow, that is, with as few columns as possible. 此外,索引应该是窄的,即尽可能少的列。

Try running 试试跑步

DBCC DROPCLEANBUFFERS

Before the queries... 在查询之前......

If you really want to compare them. 如果你真的想比较它们。 Physical reads don't mean the same as logical reads when optimizing a query 优化查询时,物理读取与逻辑读取的含义不同

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM