简体   繁体   English

参考表的群集与非群集

[英]Clustered vs Nonclustered for a Reference table

I have a simple product table that keeps track of product data. 我有一个简单的产品表,可以跟踪产品数据。 Most of the time i don't need to know what type of product it is, but every once in awhile i need the product type. 大多数时候,我不需要知道产品的类型,但是每隔一段时间我就需要产品类型。 Now since not all products even have a type (which results in a lot of NULL rows), i use a reference table to join the product type when i need that info. 现在,由于并非所有产品甚至都具有类型(导致很多NULL行),因此当我需要该信息时,我会使用引用表来连接产品类型。 The reference table uses a composite key and what I'm trying to figure out is should the primary key be a cluster index or a non clustered index. 引用表使用了复合键,而我想弄清楚的是主键应该是群集索引还是非群集索引。 The product table has a clustered index for its primary key, so i was wondering if the join would be more efficient if it was also a clustered index ( so that the order of the id's are in order). 产品表的主键有一个聚集索引,所以我想知道如果连接也是一个聚集索引,连接是否会更有效(以便ID的顺序正确)。 Or is this ignored during the join and thus the nonclustered would be more efficient since it doesn't do a key lookup? 还是在联接过程中将其忽略,由于不进行键查找,因此非集群将更加有效?

CREATE TABLE [dbo].[sales_product_type]
(
    [FK_product_id] [int] NOT NULL,
    [product_type] [int] NOT NULL,
    [type_description] [nvarchar](max) NULL,

    CONSTRAINT [PK_sales_product_type] 
        PRIMARY KEY CLUSTERED ([FK_product_id] ASC, [product_type] 
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[sales_product]
(
    [product_id] [int] IDENTITY(1,1) NOT NULL,
    [FK_store_id] [int] NOT NULL,
    [price] [int] NOT NULL,
    [product_name] [nvarchar](max) NOT NULL,
    [units] [int] NULL,

    CONSTRAINT [PK_sales_product] 
        PRIMARY KEY CLUSTERED ([product_id] ASC)
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

If you need [type_description] column when you are querying for product type, you should go with the clustered index. 如果在查询产品类型时需要[type_description]列,则应使用聚集索引。 The reason being that the clustered index will have all columns of the table (including the key columns Product ID and Product Type). 原因是聚集索引将具有表的所有列(包括关键列“产品ID”和“产品类型”)。

On the other hand, if you had only a non-clustered index on Product ID and Product Type, when your query requires to fetch the type_description it would have to do a Heap Lookup for every type in the result data set. 另一方面,如果您在“产品ID”和“产品类型”上只有一个非聚集索引,则当您的查询需要获取type_description ,它将必须对结果数据集中的每种类型进行堆查找。

So if you need type_description in the result, you should keep a clustered index. 因此,如果结果中需要type_description则应保留一个聚集索引。


But, in your particular scenario, it won't matter if the type_description is larger than 8000 characters. 但是,在您的特定情况下, type_description是否大于8000个字符并不重要。 As discussed here (and here ), the column's value would be stored out-of-row if it exceeds 8000 characters. 如此 (和此处 )所讨论,如果列的值超过8000个字符,则将在行外存储。 So in any event the engine would have to perform a lookup to get that value. 因此,无论如何,引擎都必须执行查找以获取该值。


If you are not going to query type_description that often, using a non-clustered index might result in much lower reads - as the engine doesn't have to go over the type_description field. 如果您不经常查询type_description ,则使用非聚集索引可能会导致读取次数低得多-因为引擎不必type_description字段。 But I would test out both approaches before deciding on one. 但是我会在决定一种方法之前先测试这两种方法。

In general, I would always have a clustered index on the table. 通常,我总是在表上有聚集索引。 If required, I might add a non-clustered index to tune particular queries. 如果需要,我可以添加非聚集索引来调优特定查询。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM