简体   繁体   English

如何提高聚集索引查找的性能

[英]How to improve performance on a clustered index seek

I'm trying to improve the performance on a query that is running very slowly.我正在尝试提高运行非常缓慢的查询的性能。 After going through the Actual Execution Plan ;通过实际执行计划后 I found that a Clustered Index Seek was taking up 82%.我发现聚集索引搜索占用了 82%。 Is there any way for me to improve the performance on an Index Seek ?有什么办法可以提高索引查找的性能吗?

Index:索引:

/****** Object:  Index [IX_Stu]    Script Date: 12/28/2009 11:11:43 ******/
CREATE CLUSTERED INDEX [IX_Stu] ON [dbo].[stu] 
(
 [StuKey] ASC
)WITH (PAD_INDEX  = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

Table (some columns omitted for brevity):表(为简洁起见省略了一些列):

CREATE TABLE [dbo].[stu](
 [StuCertKey] [int] IDENTITY(1,1) NOT NULL,
 [StuKey] [int] NULL
 CONSTRAINT [PK_Stu] PRIMARY KEY NONCLUSTERED 
(
 [StuCertKey] ASC
)WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]

I'm generalizing here, but...我在这里概括,但是......

A clustered index seek is, for the most part, the best-case scenario.在大多数情况下,聚集索引查找是最好的情况。 The only ways I can think of to improve performance would be:我能想到的提高性能的唯一方法是:

  • Update the query to return fewer rows/columns, if possible;如果可能,更新查询以返回更少的行/列;
  • Defragment or rebuild the index;对索引进行碎片整理或重建;
  • Partition the index across multiple disks/servers.跨多个磁盘/服务器对索引进行分区。

If it's only returning 138 rows, and it's that slow... maybe it's being blocked by some other process?如果它只返回 138 行,而且速度很慢……也许它被其他进程阻止了? Are you testing this in isolation, or are other users/processes online at the same time?您是单独进行测试,还是其他用户/进程同时在线? Or maybe it's even a hardware problem, like a disk failure.或者甚至可能是硬件问题,例如磁盘故障。

Clustered Index seeks occur when non-clustered indexes are used and aren't necessarily bad.聚集索引查找发生在使用非聚集索引并且不一定是坏的。

Consider the following query:考虑以下查询:

SELECT s.StuKey, s.Name, s.Address, s.City, s.State FROM stu s WHERE State='TX'

If there is only a clustered index on StuKey, then Sql Server only has 1 option, it must scan the entire table looking for rows where State="TX' and return those rows.如果 StuKey 上只有一个聚集索引,那么 Sql Server 只有 1 个选项,它必须扫描整个表寻找 State="TX" 的行并返回这些行。

If you add a non-clustered index on State如果在 State 上添加非聚集索引

CREATE INDEX IX_Stu_State on Stu (State)

Now Sql server has a new option.现在 Sql 服务器有一个新选项。 It can choose to seek using the non-clustered index, which will produce the rows where State='TX'.它可以选择使用非聚集索引进行查找,这将产生 State='TX' 的行。 However, in order to get the remaining columns to return in the SELECT, it has to look up those columns by doing a clustered index seek for each row.但是,为了让剩余的列在 SELECT 中返回,它必须通过对每一行进行聚集索引查找来查找这些列。

If you want to reduce the clustered index seeks, then you can make your index "covering" by including extra columns in it.如果您想减少聚集索引搜索,那么您可以通过在索引中包含额外的列来“覆盖”索引。

 CREATE INDEX IX_Stu_State2 on Stu (State) INCLUDE (name, address, city )

This index now contains all the columns needed to answer the query above.该索引现在包含回答上述查询所需的所有列。 The query will do an index seek to return only the rows where State='TX', and the additional columns can be pulled out of the non-clustered index, so the clustered index seeks go away.查询将执行索引查找以仅返回 State='TX' 的行,并且可以从非聚集索引中拉出额外的列,因此聚集索引查找消失。

A clustered index range seek that returns 138 rows is not your problem.返回 138 行的聚集索引范围查找不是您的问题。

Technically you can improve the seek performance by making the clustered index narrower:从技术上讲,您可以通过使聚集索引变窄来提高搜索性能:

Both can have quite a dramatic impact on range seek time, as they reduce the IO and the need to hit physical reads.两者都可以对范围搜索时间产生相当大的影响,因为它们减少了 IO 和命中物理读取的需要。 Of course, as usually, the result will vary on a big number of other factors, like what columns do you project (evicting a projected column into BLOB allocation unit may actually have adverse effects on certain queries).当然,通常情况下,结果会因许多其他因素而异,例如您投影哪些列(将投影列驱逐到 BLOB 分配单元实际上可能会对某些查询产生不利影响)。 As a side note, usually fragmentation will have only a marginal impact on such a short range scan.作为旁注,通常碎片对这种短距离扫描的影响很小。 Again, it depends.再次,这取决于。

But as I say, I highly doubt this is your true problem.但正如我所说,我非常怀疑这是你真正的问题。 You have only posted selected parts of the plan and the results of your own analysis.您只发布了计划的选定部分和您自己的分析结果。 The true root cause may lay completely elsewhere.真正的根本原因可能完全在别处。

Thoughts...想法...

  • Why is IX_Stu clustered?为什么 IX_Stu 是集群的? Internally, SQL Server adds a 4 byte "uniqueifier" to non-unique clustered indexes.在内部,SQL Server 向非唯一聚集索引添加了一个 4 字节的“唯一标识符”。 What is the justification?理由是什么? This also bloats your PK too这也会让你的 PK 变得臃肿

  • What is the actual query you are running?您正在运行的实际查询是什么?

  • Finally, why FILLFACTOR 80%?最后,为什么 FILLFACTOR 80%?

Edit:编辑:

  • A "normal" FILLFACTOR would be 90%, but this is a rule of thumb only “正常” FILLFACTOR 将是 90%,但这只是一个经验法则

  • An 11 join query?一个 11 连接查询? That's most likely your problem.这很可能是你的问题。 What are your JOINs, WHERE clauses etc?您的 JOIN、WHERE 子句等是什么? What is the full text plan?什么是全文计划?

Some general advice: when I have to do query optimization, I start by writing out what I think the execution plan should be.一些一般性建议:当我必须进行查询优化时,我首先写出我认为执行计划应该是什么。

Once I've decided what I think the execution plan should be, I try to make the actual query fit this plan.一旦我决定了我认为的执行计划应该是什么,我就会尝试使实际查询适合这个计划。 The techniques to do this are different for each DBMS, and do not necessarily transfer from one to the other, or even, sometimes, between different versions of the DBMS.执行此操作的技术对于每个 DBMS 都不同,并且不一定从一个转移到另一个,甚至有时在不同版本的 DBMS 之间转移。

The thing to keep in mind is that the DBMS can only execute one join at a time: it starts with two initial tables, joins those, and then takes the result of that operation and joins it to the next table.需要记住的是,DBMS 一次只能执行一个连接:它从两个初始表开始,连接它们,然后获取该操作的结果并将其连接到下一个表。 The goal at each step is to minimize the number of rows in the intermediate result set (more correctly, to minimize the number of blocks that have to be read to produce the intermediate results, but this generally means fewest rows).每个步骤的目标是最小化中间结果集中的行数(更准确地说,是最小化必须读取以生成中间结果的块数,但这通常意味着最少的行)。

What happens if you hard-code your WHERE criteria, like this:如果您对WHERE条件进行硬编码,会发生什么情况,如下所示:

SELECT StuCertKey, StuKey FROM stu 
WHERE stuKey in (/* list 50 values of StuKey here */)

If it's still very slow, you have an internal problem of some kind.如果它仍然很慢,则说明存在某种内部问题。 If it's faster, then the index isn't your bottleneck, it's the JOINs that you're doing to create the WHERE filter.如果它更快,那么索引就不是您的瓶颈,而是您为创建WHERE过滤器所做的JOIN

Note that SELECT * can be very slow if there are many large columns, and especially if there are BLOBs.请注意,如果有很多大列,尤其是有 BLOB 时, SELECT *可能会非常慢。

Have you tried some maintenance on this index?您是否尝试过对该索引进行一些维护? Like defrag it?喜欢碎片整理吗? Seems really strange that it costs THAT much (120.381).花费那么多(120.381)似乎真的很奇怪。 Index seek is the fastest index operation, shouldn't take that long.索引查找是最快的索引操作,不应该花那么长时间。 Can you post the query?你可以发布查询吗?

Check the index statictics.检查索引静态。

reCalculating the clustered-index statistics will solve the problem.重新计算聚集索引统计信息将解决问题。

in my case, i was looking for 30 records in 40M recored.就我而言,我正在寻找 40M 中的 30 条记录。 the execution plan says it's going through the clustered-index but it took about 200ms.执行计划说它正在通过聚集索引,但花了大约 200 毫秒。 and the index wasn't defragmented.并且索引没有进行碎片整理。 after recalculating it's stats, it's getting done under 10ms!重新计算它的统计信息后,它在 10 毫秒内完成!

Rebuild the index, and calculate stats?重建索引,并计算统计数据?

The only other way that I can think to speed it up is to partition the table, which may or may not be possible.我能想到的加速它的唯一另一种方法是对表进行分区,这可能也可能不可能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM