简体   繁体   English

如何在Sql Server中构造group by的索引

[英]how to structure an index for group by in Sql Server

The following simple query takes a very long time (several minutes) to execute. 以下简单查询需要很长时间(几分钟)才能执行。

I have an index: 我有一个索引:

create index IX on [fctWMAUA] (SourceSystemKey, AsAtDateKey)
SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
FROM [fctWMAUA] (NOLOCK) AS [t0]
WHERE SourceSystemKey in (1,2,3,4,5,6,7,8,9)
GROUP BY [t0].[SourceSystemKey]

The statistics are as follows: 统计数据如下:

  • logical reads 1827978 逻辑读取1827978
  • physical reads 1113 物理读取1113
  • read aheads 1806459 预读1806459

Taking that exact same query and reformatting it as follows gives me these statistics: 采用完全相同的查询并重新格式化如下给我这些统计信息:

  • logical reads 36 逻辑读数36
  • physical reads 0 物理读数0
  • read aheads 0 预读0

It takes 31ms to execute. 执行需要31毫秒。

SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
 FROM [fctWMAUA] (NOLOCK) AS [t0]
 WHERE SourceSystemKey = 1
 GROUP BY [t0].[SourceSystemKey]
UNION
 SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
 FROM [fctWMAUA] (NOLOCK) AS [t0]
 WHERE SourceSystemKey = 2
 GROUP BY [t0].[SourceSystemKey]
UNION
 SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
 FROM [fctWMAUA] (NOLOCK) AS [t0]
 WHERE SourceSystemKey = 3
 GROUP BY [t0].[SourceSystemKey]
/* AND SO ON TO 9 */

How do I make an index that does the group by quickly? 如何快速制作完成该组的索引?

I have found that the best solution is the following. 我发现最好的解决方案如下。 It mimics the union version of the query, and runs very quickly. 它模仿查询的联合版本,并且运行得非常快。

40 logical reads, and an execution time of 3ms. 40个逻辑读取,执行时间为3ms。

SELECT [t3].[value]
FROM [dimSourceSystem] AS [t0]
OUTER APPLY (
    SELECT MAX([t2].[value]) AS [value]
    FROM (
        SELECT [t1].[AsAtDateKey] AS [value], [t1].[SourceSystemKey]
        FROM [fctWMAUA] AS [t1]
        ) AS [t2]
    WHERE [t2].[SourceSystemKey] = ([t0].[SourceSystemKey])
    ) AS [t3]

Its difficult to say without looking at an execution plan, however you might want to try the following: 如果不查看执行计划很难说,但是您可能想尝试以下方法:

SELECT * FROM
(
    SELECT MAX(t0.AsAtDateKey) AS [Date], t0.SourceSystemKey AS SourceSystem
    FROM fctWMAUA (NOLOCK) AS t0
    GROUP BY t0.SourceSystemKey
)
WHERE SourceSystem in (1,2,3,4,5,6,7,8,9)

Its difficult to tell without looking at an execution plan, but I think that whats happening is that SQL server is not clever enough to realise that the WHERE clause specified is filtering out the groups, and does not have any effect on the records included for each group. 在没有查看执行计划的情况下很难分辨,但我认为发生的事情是SQL服务器不够聪明,无法意识到指定的WHERE子句是过滤掉组,并且对每个组包含的记录没有任何影响。组。 As soon as SQL server realises this its free to use some more inteligent index lookups to work out the maximum values (which is whats happening in your second query) 一旦SQL服务器意识到这一点就可以免费使用一些更智能的索引查找来计算出最大值(这是第二个查询中发生的事情)

Just a theory, but it might be worth a try. 只是一个理论,但它可能值得一试。

Try to tell SQL Server to use the index: 尝试告诉SQL Server使用索引:

...
FROM [fctWMAUA] (NOLOCK, INDEX(IX)) AS [t0]
...

Make sure the statistics for the table are up to date: 确保表的统计信息是最新的:

UPDATE STATISTICS [fctWMAUA]

For better answers, turn on the showplan for both queries: 要获得更好的答案,请打开两个查询的showplan:

SET SHOWPLAN_TEXT ON

and add the results to your question. 并将结果添加到您的问题中。

You can also write the query without a GROUP BY. 您也可以在没有GROUP BY的情况下编写查询。 For example, you can use an exclusive LEFT JOIN excluding rows with older dates: 例如,您可以使用独有的LEFT JOIN,不包括具有较旧日期的行:

select cur.SourceSystemKey, cur.date
from fctWMAUA cur
left join fctWMAUA next
    on next.SourceSystemKey = next.SourceSystemKey
    and next.date > cur.date
where next.SourceSystemKey is null
and cur.SourceSystemKey in (1,2,3,4,5,6,7,8,9)

This can be surprisingly fast, but I don't think it could beat the UNION. 这可能会非常快,但我认为它不会击败UNION。

Use HAVING instead of WHERE, so that the filtering happens AFTER grouping has occurred: 使用HAVING而不是WHERE,以便在发生分组后进行过滤:

SELECT MAX(AsAtDateKey) AS [Date], SourceSystemKey AS SourceSystem
FROM fctWMAUA (NOLOCK)
GROUP BY SourceSystemKey
HAVING SourceSystemKey in (1,2,3,4,5,6,7,8,9)

I also don't particularly care for the IN clause, especially when it could be replaced with "<10" or "Between 1 and 9", which are used better by sorted indexes. 我也不特别关心IN子句,特别是当它可以替换为“<10”或“1到9之间”时,它们被排序索引更好地使用。

 WHERE SourceSystemKey = 3
 GROUP BY [t0].[SourceSystemKey]

You don't need to group by a fixed field. 您不需要按固定字段分组。

Any way I prefer the first sentence. 我更喜欢第一句话。 May be I will replace the 可能是我会替换的

 WHERE SourceSystemKey in (1,2,3,4,5,6,7,8,9)

for something like 喜欢的东西

 WHERE SourceSystemKey BETWEEN 1 AND 9

or 要么

 WHERE SourceSystemKey >= 1 AND SourceSystemKey <= 9

if SourceSystemKey is an integer. 如果SourceSystemKey是一个整数。 But I don't think it will cause a big change. 但我认为这不会引起重大变化。

What I will test first is rebuild statistics and rebuild all indexes for the table and wait some time. 我将首先测试的是重建统计信息并重建表的所有索引并等待一段时间。 Rebuilding is not instant, it will depend on how busy is the server but this sentence is well structured for the index be used by the optimizer. 重建不是即时的,它将取决于服务器的繁忙程度,但这句话的结构很好,优化器使用的索引。

Regards. 问候。

Have you tried creating another index just on the SourceSystemKey column? 您是否尝试在SourceSystemKey列上创建另一个索引? The high number of logical reads when you use that column in your where clause makes me think it is doing an index/table scan. 在where子句中使用该列时,大量的逻辑读取使我认为它正在进行索引/表扫描。 Could you run the execution plan on this and see if that's the case? 你可以在这个上运行执行计划,看看是否是这种情况? The execution plan might come up with an index suggestion as well. 执行计划也可能提出索引建议。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM