Why this index doesn't improve query performance

Question

Platform : SQL Server 2012

Background : I have two fairly large log tables - around 600k records each that are being joined using Pk/Fk. For the sake of argument, lets call them ReallyBigLog1 and ReallyBigLog2. The query (below) takes about 3.5-sec to run. The WHERE clause includes three different values. When asked to help improve this query, I immediately noticed that the items in the WHERE clause were not indexed. I smugly suggested adding indexes - assuming the increased performance would make me look like a hero. However, the additional index had no measurable affect.

Question : Given the query below, why does indexing StartTime, EndTime, and DateStamp have no measurable affect on query time?

Query

SELECT 

    IrreleventField1,
    IrreleventField2,
    IrreleventField3....

    FROM  [dbo].[ReallyBigLog1] AS [T1]

    INNER JOIN [dbo].[ReallyBigLog2] AS [T2] ON [T1].[Id] = [T2].[Id]

    WHERE ([T1].[EndTime] IS NOT NULL) AND ([T1].[StartTime] IS NOT NULL) AND ([T2].[DateStamp] >= '2017-5-16 00:00:00')

Indexes

CREATE NONCLUSTERED INDEX [ix_RecommendedIndex]
ON [dbo].[ReallyBigLog1]
([StartTime] , [EndTime])

CREATE NONCLUSTERED INDEX [IX_DateStamp]
ON [dbo].[ReallyBigLog2]
([DateStamp])

Execution Plan

5 SELECT            
    4 Compute Scalar        
        3 Merge Join  / Inner Join Merge:([dbo].[ReallyBigLog1].[Id] [T2]=[dbo].[ReallyBigLog1].[Id] [T1]), Residual:([dbo].[ReallyBigLog2].[Id] as [T2].[Id]=[dbo].[ReallyBigLog1].[Id] as [T1].[Id])  
            1 Clustered Index Scan Predicate:([dbo].[ReallyBigLog1].[StartTime] as [T1].[StartTime] IS NOT NULL AND [dbo].[ReallyBigLog1].[EndTime] as [T1].[EndTime] IS NOT NULL), ORDERED FORWARD [dbo].[ReallyBigLog1].[PK_dbo.ReallyBigLog1] [T1]
            2 Clustered Index Scan Predicate:([dbo].[ReallyBigLog2].[DateStamp] as [T2].[DateStamp]>='2017-05-16 00:00:00.000'), ORDERED FORWARD [dbo].[ReallyBigLog2].[PK_dbo.ReallyBigLog2] [T2]

EDIT (Tables Composition)

SELECT
  (SELECT COUNT(*) FROM ReallyBigLog1 WHERE StartTime IS NULL) as NullStartTime,
  (SELECT COUNT(*) FROM ReallyBigLog1 WHERE EndTime IS NULL) as NullEndTime,
  (SELECT COUNT(*) FROM ReallyBigLog1) as Log1Count,
  (SELECT COUNT(*) FROM ReallyBigLog2 WHERE DateStamp > '2017-5-16 00:00:00') AS DateStampUsage,
  (SELECT COUNT(*) FROM ReallyBigLog2) AS Log2Count

DateStampUsage  Log2Count   NullStartTime   NullEndTime  Log1Count
443038          651929      33748           34144        509545

Answer 1

ix_RecommendedIndex will be of very poor help, unless you have a lots of nulls.

Here, the indexes which really matters are Ids and IX_DateStamp . Since you seem to have a lots of matching data in the WHERE clause, the optimiser prefers a clustered table scan (to merge the Ids ).

One possibility to make it faster would be a CLUSTERED index on IX_DateStamp , but it will have performance side effects for other queries, and should be stressed on a test environment first.

~~If you can provide the EXPLAIN with statistics, it may help for a better diagnostic.~~

edit: With the statistics provided, I don't see how you can make it faster just with indexes. There are way too many data to parse (more than half of the two tables). You are hitting the point where you may need to consolidate your data appart, in another table, or optimize the data at the binary level (smaller record size for faster scans).

Answer 2

Since you're fetching most of the rows in the tables, the indexes have to be covering (=contain every column you need in your query from that table) to help you at all -- and that improvement might not be much.

The reason the indexes don't really help is that you're reading most of the rows, and you have IrreleventField s in your query. Since the index contains only the index key + clustered key, the rest of the fields must be fetched from the table (=clustered index) using the clustered index key. That's called key lookup and can be very costly, because it has to be done for every single row found from the index that matches your search criteria.

For the index being covered, you can add the "irrelevant" fields into include part of the index, if you want to try if it improves the situation.

Answer 3

Having an index on the date and time alone is not going to help as much. You should have an index that covers conditions to your joins as well.. Such as the ID columns. Since your query is primarily quantifying on the time-stamp of the T2 alias, I would offer the following indexes

table           index
ReallyBigLog2   (DateStamp, ID )
ReallyBigLog1   (id, endTime, StartTime )

And here is why. You are specifically looking for transactions in T2 > a given date. So the really big log 2 STARTS with that as the basis. Then ALSO include the "ID" column for the JOIN basis to log table 1. Both parts of the index here are covered and do not require going to the data pages for comparison to get the fields yet.

Now, the columns index for T1. Start with the ID as an immediate found or not to the T2 table. Having the endTime, StartTime as part of the index, again, it does not have to go to the raw data pages to qualify the WHERE / JOIN criteria.

Once that is all done, it has the set of records, goes to the data pages for those and pulls the rest of the details you need.

from
   [dbo].[ReallyBigLog2] AS [T2]
      JOIN [dbo].[ReallyBigLog1] AS [T1]
         ON [T1].[Id] = [T2].[Id]
         AND ([T1].[EndTime] IS NOT NULL) 
         AND ([T1].[StartTime] IS NOT NULL) 
where
   [T2].[DateStamp] >= '2017-5-16 00:00:00'

Why this index doesn't improve query performance

Question

3 answers

solution1
2 2017-06-16 20:41:09

solution2
1 ACCPTED 2017-06-16 20:53:46

solution3
1 2017-06-16 21:17:42

Why this index doesn't improve query performance

Question

3 answers

solution1 2 2017-06-16 20:41:09

solution2 1 ACCPTED 2017-06-16 20:53:46

solution3 1 2017-06-16 21:17:42

solution1
2 2017-06-16 20:41:09

solution2
1 ACCPTED 2017-06-16 20:53:46

solution3
1 2017-06-16 21:17:42