简体   繁体   English

SQL Server中的查询性能

[英]Query Performance in SQL Server

I have a SQL Server table with over 11 million records. 我有一个包含超过1100万条记录的SQL Server表。 These records are organized by "Category" and "Platform". 这些记录按“类别”和“平台”进行组织。 I am stumped by the following scenario ... 我为以下情况感到困惑...

SELECT COUNT(*) FROM TableName WHERE Category = 'session' AND Platform = 'windows';
-- Returns 1261500

SELECT COUNT(*) FROM TableName WHERE Category = 'session' AND Platform = 'linux';
-- Returns 1890599

So there are over 600K more records associated with 'linux' than 'windows'. 因此,与“ linux”关联的记录比“ windows”多出60万多个记录。

However, this query returns in 6-9 seconds ... 但是,此查询将在6-9秒内返回...

SELECT MAX(id) FROM TableName WHERE Category = 'session' AND Platform = 'linux';

Yet this one I have to kill after waiting over 13 minutes for a result ... 然而,我必须等待13分钟以上才能杀死它……

SELECT MAX(id) FROM TableName WHERE Category = 'session' AND Platform = 'windows';

Oh ... I also have the following index on the table ... 哦...我桌上还有以下索引...

CREATE NONCLUSTERED INDEX [IX_TableName_CategoryPlatform] ON [dbo].[TableName]
(
    [Platform] ASC,
    [Category] ASC,
    [CreateDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

Whiskey, Tango, Foxtrot? 威士忌,探戈,狐步舞?

Why does the search term make a difference, particularly since there is an index in place? 为什么搜索词会有所作为,特别是因为已有索引?

UPDATE UPDATE

I have just made the following observation ... 我刚刚做了以下观察...

SELECT MAX(id) FROM TableName WHERE Platform = 'windows';

By dropping the Category from the query, the response is returned very quickly ... 通过从查询中删除类别 ,可以非常快速地返回响应...

UPDATE 2 更新2

I have created a couple of execution plans as requested. 我已根据要求创建了几个执行计划。 The thing I noticed, however is that the percentages in the plans generated by the "Paste The Plan" utility and what I am getting in SSMS appear to be different so I am including, below each link, the percentages that I am seeing in Management Studio. 但是,我注意到的是,“粘贴计划”实用程序生成的计划中的百分比与我在SSMS中获得的内容似乎有所不同,因此我在每个链接下方包括了在“管理”中看到的百分比工作室。

For the following Query (which works) ... 对于以下查询(有效)...

SELECT MAX([MessageID]) [MaxID] FROM [BoothComm].[UniversalMessageQueue] WHERE [MessagePlatform]='windows';

https://www.brentozar.com/pastetheplan/?id=Sk9q59CqZ https://www.brentozar.com/pastetheplan/?id=Sk9q59CqZ

  • 0% : Select 0%:选择
  • 0% : Stream Aggregate 0%:流聚合
  • 0% : Top 0%:最高
  • 100% : Index Scan 100%:索引扫描

The next query (which doesn't work) I can only provide an ESTIMATED execution plan. 下一个查询(不起作用)我只能提供一个ESTIMATED执行计划。

SELECT 
   MAX(MessageID) AS [MaxID]
FROM BoothComm.UniversalMessageQueue
WHERE 
 MessageCategory = 'session'
 AND
 MessagePlatform = 'windows'

https://www.brentozar.com/pastetheplan/?id=r1zqnq09- https://www.brentozar.com/pastetheplan/?id=r1zqnq09-

  • 0% : Select 0%:选择
  • 0% : Stream Aggregate 0%:流聚合
  • 0% : Top 0%:最高
  • 0% : Nested Loops (Inner Join) -- Why is this there?? 0%:嵌套循环(内部联接)-为什么会出现?
  • 21% : Index Scan 21%:索引扫描
  • 79% : Key Lookup -- Also new and seems to want to take up more time than anything else 79%:关键查找-也是新功能,似乎要占用更多时间

(thanks for all the help!) (感谢所有帮助!)

UPDATE 3 更新3

So after all of the below conversation and changes made I am still left with the question ... 因此,在下面所有的讨论和更改之后,我仍然遇到问题...

Why does this query return in under 1 second (thanks to adding the ID to the index) ... 为什么此查询在1秒内​​返回(由于将ID添加到索引)...

SELECT 
      MAX(MessageID) AS [MaxID]
FROM BoothComm.UniversalMessageQueue
WHERE 
    MessagePlatform = 'linux'
    AND
    MessageCategory = 'accounting'

And this one take 13 -22 seconds to run ... 而这需要13 -22秒才能运行...

SELECT 
      MAX(MessageID) AS [MaxID]
FROM BoothComm.UniversalMessageQueue
WHERE 
    MessagePlatform = 'windows'
    AND
    MessageCategory = 'accounting'

Same table, same indexes, execution plans are the absolute same. 相同的表,相同的索引,执行计划是绝对相同的。 Everything is identical except for the MessagePlatform value. MessagePlatform值外,其他所有内容都相同。 And the value which is responsible for the latency appears on fewer records than the other. 并且引起延迟的值出现在比其他记录更少的记录上。

Your queries are slow because the table is not normalized. 您的查询很慢,因为该表未规范化。 You should not be storing Category and Platforms as strings on every record. 您不应在每个记录上都将类别和平台存储为字符串。 Instead they should be in lookup tables with an integer primary key. 相反,它们应该位于具有整数主键的查找表中。 These keys would then be stored in your main table with appropriate non clustered indexes on each one. 然后,这些键将存储在主表中,并且每个键上都具有适当的非聚集索引。 Then you should add a clustered index to your main table on a column that makes sense to have sorted in ascending order (preferably a unique integer). 然后,您应该在主表上的列上添加一个聚集索引,该索引应该以升序排序(最好是唯一的整数)。

As to the actual problem you are encountering, if you have no clustered index defined, the data is stored in a heap (ie an unsorted pile of data). 对于您遇到的实际问题,如果没有定义聚簇索引,则数据存储在堆中(即未排序的数据堆)。 The index you have will help but performance is hampered by the fact that you are using strings as keys, and from the looks of it these strings are not highly specific (many repeats). 您拥有的索引会有所帮助,但由于您使用字符串作为键这一事实而使性能受到了限制,并且从字符串的外观看,这些字符串并不是高度特定的(很多重复)。 SQL Server may simply be deciding to do a full scan to answer your question, as it is estimating that will be faster than any other method. SQL Server可能只是简单地决定进行全面扫描以回答您的问题,因为它估计这比任何其他方法都快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM