简体   繁体   English

为什么该索引不能提高查询性能

[英]Why this index doesn't improve query performance

Platform : SQL Server 2012 平台 :SQL Server 2012

Background : I have two fairly large log tables - around 600k records each that are being joined using Pk/Fk. 背景 :我有两个相当大的日志表-大约有60万条记录,每个记录都是使用Pk / Fk连接的。 For the sake of argument, lets call them ReallyBigLog1 and ReallyBigLog2. 为了便于讨论,我们称它们为ReallyBigLog1和ReallyBigLog2。 The query (below) takes about 3.5-sec to run. 查询(如下)大约需要3.5秒才能运行。 The WHERE clause includes three different values. WHERE子句包含三个不同的值。 When asked to help improve this query, I immediately noticed that the items in the WHERE clause were not indexed. 当被问到如何帮助改进此查询时,我立即注意到WHERE子句中的项目未建立索引。 I smugly suggested adding indexes - assuming the increased performance would make me look like a hero. 我自鸣得意地建议添加索引-假设提高的性能会使我看起来像个英雄。 However, the additional index had no measurable affect. 但是,附加索引没有可测量的影响。

Question : Given the query below, why does indexing StartTime, EndTime, and DateStamp have no measurable affect on query time? 问题 :给定以下查询,为什么索引StartTime,EndTime和DateStamp对查询时间没有可测量的影响?

Query 询问

SELECT 

    IrreleventField1,
    IrreleventField2,
    IrreleventField3....

    FROM  [dbo].[ReallyBigLog1] AS [T1]

    INNER JOIN [dbo].[ReallyBigLog2] AS [T2] ON [T1].[Id] = [T2].[Id]

    WHERE ([T1].[EndTime] IS NOT NULL) AND ([T1].[StartTime] IS NOT NULL) AND ([T2].[DateStamp] >= '2017-5-16 00:00:00')

Indexes 索引

CREATE NONCLUSTERED INDEX [ix_RecommendedIndex]
ON [dbo].[ReallyBigLog1]
([StartTime] , [EndTime])

CREATE NONCLUSTERED INDEX [IX_DateStamp]
ON [dbo].[ReallyBigLog2]
([DateStamp])

Execution Plan 执行计划

5 SELECT            
    4 Compute Scalar        
        3 Merge Join  / Inner Join Merge:([dbo].[ReallyBigLog1].[Id] [T2]=[dbo].[ReallyBigLog1].[Id] [T1]), Residual:([dbo].[ReallyBigLog2].[Id] as [T2].[Id]=[dbo].[ReallyBigLog1].[Id] as [T1].[Id])  
            1 Clustered Index Scan Predicate:([dbo].[ReallyBigLog1].[StartTime] as [T1].[StartTime] IS NOT NULL AND [dbo].[ReallyBigLog1].[EndTime] as [T1].[EndTime] IS NOT NULL), ORDERED FORWARD [dbo].[ReallyBigLog1].[PK_dbo.ReallyBigLog1] [T1]
            2 Clustered Index Scan Predicate:([dbo].[ReallyBigLog2].[DateStamp] as [T2].[DateStamp]>='2017-05-16 00:00:00.000'), ORDERED FORWARD [dbo].[ReallyBigLog2].[PK_dbo.ReallyBigLog2] [T2]

EDIT (Tables Composition) 编辑 (表组成)

SELECT
  (SELECT COUNT(*) FROM ReallyBigLog1 WHERE StartTime IS NULL) as NullStartTime,
  (SELECT COUNT(*) FROM ReallyBigLog1 WHERE EndTime IS NULL) as NullEndTime,
  (SELECT COUNT(*) FROM ReallyBigLog1) as Log1Count,
  (SELECT COUNT(*) FROM ReallyBigLog2 WHERE DateStamp > '2017-5-16 00:00:00') AS DateStampUsage,
  (SELECT COUNT(*) FROM ReallyBigLog2) AS Log2Count

DateStampUsage  Log2Count   NullStartTime   NullEndTime  Log1Count
443038          651929      33748           34144        509545

ix_RecommendedIndex will be of very poor help, unless you have a lots of nulls. 除非您有很多空值,否则ix_RecommendedIndex帮助将非常差。

Here, the indexes which really matters are Ids and IX_DateStamp . 在这里,真正重要的索引是IdsIX_DateStamp Since you seem to have a lots of matching data in the WHERE clause, the optimiser prefers a clustered table scan (to merge the Ids ). 由于WHERE子句中似乎有很多匹配数据,因此优化程序更喜欢集群表扫描(合并Ids )。

One possibility to make it faster would be a CLUSTERED index on IX_DateStamp , but it will have performance side effects for other queries, and should be stressed on a test environment first. 使其更快的一种可能性是IX_DateStamp上的CLUSTERED索引,但是它会对其他查询产生性能副作用,因此应该首先在测试环境中进行强调。

If you can provide the EXPLAIN with statistics, it may help for a better diagnostic. 如果可以为EXPLAIN提供统计信息,则可能有助于更好地进行诊断。

edit: With the statistics provided, I don't see how you can make it faster just with indexes. 编辑:通过提供的统计信息,我看不到如何通过索引才能使其更快。 There are way too many data to parse (more than half of the two tables). 有太多数据无法解析(两个表的一半以上)。 You are hitting the point where you may need to consolidate your data appart, in another table, or optimize the data at the binary level (smaller record size for faster scans). 您可能需要将数据应用程序合并到另一个表中,或者在二进制级别优化数据(较小的记录大小以加快扫描速度)。

Since you're fetching most of the rows in the tables, the indexes have to be covering (=contain every column you need in your query from that table) to help you at all -- and that improvement might not be much. 由于您要获取表中的大多数行,因此索引必须覆盖(=包含该表中查询中需要的每一列)以完全为您提供帮助-而且改进可能并不多。

The reason the indexes don't really help is that you're reading most of the rows, and you have IrreleventField s in your query. 索引没有真正帮助的原因是您正在读取大多数行,并且查询中包含IrreleventField Since the index contains only the index key + clustered key, the rest of the fields must be fetched from the table (=clustered index) using the clustered index key. 由于索引仅包含索引键+聚集键,因此必须使用聚集索引键从表(=聚集索引)中获取其余字段。 That's called key lookup and can be very costly, because it has to be done for every single row found from the index that matches your search criteria. 这就是所谓的关键字查找,并且可能非常昂贵,因为必须对从索引中找到的与您的搜索条件相匹配的每一行进行查找。

For the index being covered, you can add the "irrelevant" fields into include part of the index, if you want to try if it improves the situation. 对于要涵盖的索引,可以尝试将“无关”字段添加到索引的包括部分中,如果要尝试改善情况的话。

Having an index on the date and time alone is not going to help as much. 仅在日期和时间上建立索引不会有多大帮助。 You should have an index that covers conditions to your joins as well.. Such as the ID columns. 您应该有一个索引,该索引也应涵盖联接条件。例如ID列。 Since your query is primarily quantifying on the time-stamp of the T2 alias, I would offer the following indexes 由于您的查询主要是量化T2别名的时间戳,因此我将提供以下索引

table           index
ReallyBigLog2   (DateStamp, ID )
ReallyBigLog1   (id, endTime, StartTime )

And here is why. 这就是为什么。 You are specifically looking for transactions in T2 > a given date. 您正在特别寻找T2>给定日期中的交易。 So the really big log 2 STARTS with that as the basis. 因此,真正的大日志2以此为基础。 Then ALSO include the "ID" column for the JOIN basis to log table 1. Both parts of the index here are covered and do not require going to the data pages for comparison to get the fields yet. 然后,还将JOIN基础的“ ID”列包括在日志表1中。索引的两个部分都已覆盖,并且不需要进入数据页进行比较即可获取字段。

Now, the columns index for T1. 现在,T1的列索引。 Start with the ID as an immediate found or not to the T2 table. 从ID作为立即发现的T2表开始。 Having the endTime, StartTime as part of the index, again, it does not have to go to the raw data pages to qualify the WHERE / JOIN criteria. 同样,将endTime,StartTime作为索引的一部分,也不必转到原始数据页即可限定WHERE / JOIN条件。

Once that is all done, it has the set of records, goes to the data pages for those and pulls the rest of the details you need. 完成所有操作后,它便具有记录集,并转到这些记录的数据页,并提取您需要的其余详细信息。

from
   [dbo].[ReallyBigLog2] AS [T2]
      JOIN [dbo].[ReallyBigLog1] AS [T1]
         ON [T1].[Id] = [T2].[Id]
         AND ([T1].[EndTime] IS NOT NULL) 
         AND ([T1].[StartTime] IS NOT NULL) 
where
   [T2].[DateStamp] >= '2017-5-16 00:00:00'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM