Query Performance in SQL Server

Question

I have a SQL Server table with over 11 million records. These records are organized by "Category" and "Platform". I am stumped by the following scenario ...

SELECT COUNT(*) FROM TableName WHERE Category = 'session' AND Platform = 'windows';
-- Returns 1261500

SELECT COUNT(*) FROM TableName WHERE Category = 'session' AND Platform = 'linux';
-- Returns 1890599

So there are over 600K more records associated with 'linux' than 'windows'.

However, this query returns in 6-9 seconds ...

SELECT MAX(id) FROM TableName WHERE Category = 'session' AND Platform = 'linux';

Yet this one I have to kill after waiting over 13 minutes for a result ...

SELECT MAX(id) FROM TableName WHERE Category = 'session' AND Platform = 'windows';

Oh ... I also have the following index on the table ...

CREATE NONCLUSTERED INDEX [IX_TableName_CategoryPlatform] ON [dbo].[TableName]
(
    [Platform] ASC,
    [Category] ASC,
    [CreateDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

Whiskey, Tango, Foxtrot?

Why does the search term make a difference, particularly since there is an index in place?

UPDATE

I have just made the following observation ...

SELECT MAX(id) FROM TableName WHERE Platform = 'windows';

By dropping the Category from the query, the response is returned very quickly ...

UPDATE 2

I have created a couple of execution plans as requested. The thing I noticed, however is that the percentages in the plans generated by the "Paste The Plan" utility and what I am getting in SSMS appear to be different so I am including, below each link, the percentages that I am seeing in Management Studio.

For the following Query (which works) ...

SELECT MAX([MessageID]) [MaxID] FROM [BoothComm].[UniversalMessageQueue] WHERE [MessagePlatform]='windows';

https://www.brentozar.com/pastetheplan/?id=Sk9q59CqZ

0% : Select
0% : Stream Aggregate
0% : Top
100% : Index Scan

The next query (which doesn't work) I can only provide an ESTIMATED execution plan.

SELECT 
   MAX(MessageID) AS [MaxID]
FROM BoothComm.UniversalMessageQueue
WHERE 
 MessageCategory = 'session'
 AND
 MessagePlatform = 'windows'

https://www.brentozar.com/pastetheplan/?id=r1zqnq09-

0% : Select
0% : Stream Aggregate
0% : Top
0% : Nested Loops (Inner Join) -- Why is this there??
21% : Index Scan
79% : Key Lookup -- Also new and seems to want to take up more time than anything else

(thanks for all the help!)

UPDATE 3

So after all of the below conversation and changes made I am still left with the question ...

Why does this query return in under 1 second (thanks to adding the ID to the index) ...

SELECT 
      MAX(MessageID) AS [MaxID]
FROM BoothComm.UniversalMessageQueue
WHERE 
    MessagePlatform = 'linux'
    AND
    MessageCategory = 'accounting'

And this one take 13 -22 seconds to run ...

SELECT 
      MAX(MessageID) AS [MaxID]
FROM BoothComm.UniversalMessageQueue
WHERE 
    MessagePlatform = 'windows'
    AND
    MessageCategory = 'accounting'

Same table, same indexes, execution plans are the absolute same. Everything is identical except for the MessagePlatform value. And the value which is responsible for the latency appears on fewer records than the other.

Answer 1

Your queries are slow because the table is not normalized. You should not be storing Category and Platforms as strings on every record. Instead they should be in lookup tables with an integer primary key. These keys would then be stored in your main table with appropriate non clustered indexes on each one. Then you should add a clustered index to your main table on a column that makes sense to have sorted in ascending order (preferably a unique integer).

As to the actual problem you are encountering, if you have no clustered index defined, the data is stored in a heap (ie an unsorted pile of data). The index you have will help but performance is hampered by the fact that you are using strings as keys, and from the looks of it these strings are not highly specific (many repeats). SQL Server may simply be deciding to do a full scan to answer your question, as it is estimating that will be faster than any other method.

Query Performance in SQL Server

Question

UPDATE

UPDATE 2

UPDATE 3

1 answers

solution1
4 2017-09-19 13:49:26

Query Performance in SQL Server

Question

UPDATE

UPDATE 2

UPDATE 3

1 answers

solution1 4 2017-09-19 13:49:26

solution1
4 2017-09-19 13:49:26