简体   繁体   English

SQL Server索引怀疑

[英]SQL Server Indexing Doubts

Indexing is used to improve performance of sql query but I always found it little difficult to decide in which situation should I use index and in which not. 索引用于提高sql查询的性能,但我总是发现很难决定在哪种情况下应该使用索引,而在哪种情况下不应该使用索引。 I want to clarify some of my doubts regarding non-clustered index 我想澄清我对非聚集索引的一些疑问

  1. What is Non-clustered index key. 什么是非聚集索引键。 As book say each index row of non clustered index contains non clustered key value so is it mean it is the column in which we created non clustered index ie If created index on empname varchar(50) , so non clustered key will be that empname . 如书所述,非聚集索引的每个索引行都包含非聚集键值,所以这意味着它是我们在其中创建非聚集索引的列,即如果在empname varchar(50)上创建索引,则非聚集键将是该empname。

  2. Why It is preferable to create index on column with small width. 为什么最好在宽度较小的列上创建索引。 It is due to comparison with more width column takes more time for SQL server engine or is it due to it will increment hierarchy of intermediate nodes as page size is fixed so with more width column in a page or node less index row it will contain. 这是由于与更多宽度的列进行比较会导致SQL Server引擎花费更多时间,或者是由于页面大小固定时,它将增加中间节点的层次结构,因此页面所包含的宽度更大的列或节点所包含的索引行会更少。

  3. If a table contain multiple non clustered column so whether non clustered key will be combination of all this column or some unique id is generated internally by SQL with locator which will point to actual data row. 如果一个表包含多个非聚集列,那么SQL将使用定位器在内部生成指向实际数据行的非聚集键是该列的组合,还是某个唯一ID。 If possible please clear it will some real time example and graphs. 如果可能,请清除它,将提供一些实时示例和图形。

  4. Why It is said that column with non-repeatable value is good to create index as even if it contains repeated value it will definitely improve performance as once it reach to certain key value its locator will immediately found its actual row. 为什么说具有不可重复值的列很适合创建索引,因为即使包含重复值,它也肯定会提高性能,因为一旦达到某个键值,其定位器将立即找到其实际行。

  5. If column used in indexing is not unique how it find actual data row from table. 如果索引中使用的列不是唯一的,它将如何从表中查找实际数据行。

Please refer any book or tutorial which will be useful to clear my doubts. 请参阅任何有助于消除我的疑问的书或教程。

First I think we need to cover what an actual index is. 首先,我认为我们需要涵盖实际的索引。 Usually in RDBMS indexes are implemented using a variant of B-tree's (B+ variant is most common). 通常在RDBMS中,索引是使用B树的变体(最常见的是B +变体)来实现。 To put it shortly - think a binary search tree optimized for being stored on a disk. 简而言之-考虑针对存储在磁盘上而优化的二进制搜索树。 The result of looking up a key in the B-tree is usually the primary key of the table. 在B树中查找键的结果通常是表的主键。 That means if a lookup in the index completes and we need more data than what is present in the index we can do a seek in the table using the primary key. 这意味着,如果索引中的查找完成并且我们需要的数据多于索引中存在的数据,则可以使用主键在表中进行查找。

Please remember that when we think of performance for a RDBMS we usually measure this in disk accesses (I decide to ignore locking and other issues here) and not so much CPU time. 请记住,当我们考虑RDBMS的性能时,通常会通过磁盘访问(我决定在这里忽略锁定和其他问题)来衡量性能,而不用太多的CPU时间。

Having the index being non-clustered means that the actual way the data in the table is stored has no relation to the index key - whereas a clustered index specifies that the data in the table will be sorted (or clustered by) the index key - this is why there can only be one clustered index per table. 非聚集索引意味着表中数据的实际存储方式与索引键无关-聚集索引指定表中的数据将按索引键排序(或由索引键聚集)-这就是为什么每个表只能有一个聚集索引的原因。

2) Back to our model of measuring performance - if the index key is has small width (fits into a low amount of bytes) it means that per block of disk data we retrieve we can fit more keys - and as such perform lookups in the B-tree much faster if you measure disk I/O. 2)回到我们的性能评估模型-如果索引键的宽度较小(适合较小的字节数),则意味着我们检索的每个磁盘数据块都可以容纳更多的键-因此在如果测量磁盘I / O,则B树的速度要快得多。

3) I tried explaining this further up - unfortunately I don't really have any graphs or drawings to indicate this - hopefully someone else can come along and share these. 3)我尝试进一步解释这一点-不幸的是,我实际上没有任何图形或绘图来表明这一点-希望其他人可以一起分享。

4) If you're running a query like so: 4)如果您正在运行如下查询:

SELECT something, something_else FROM sometable t1 WHERE akey = 'some value'

On a table with an index defined like so: 在具有如下定义的索引的表上:

CREATE INDEX idx_sometable_akey ON sometable(akey)

If sometable has alot of rows where akey is equal to 'some value' this means alot of lookups in both the index but also in the actual table to retrieve the values of something and something_else. 如果sometable的行中akey等于“ some value”,则这意味着在索引以及实际表中都进行了大量查找,以检索something和something_else的值。 Whereas if there's a good chance that this filtering returns few rows then it also means less disk accesses. 如果此筛选很有可能返回几行,那么这也意味着较少的磁盘访问。

5) See earlier explanation 5)参见前面的解释

Hope this helps :) 希望这可以帮助 :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM