简体   繁体   中英

how to structure an index for group by in Sql Server

The following simple query takes a very long time (several minutes) to execute.

I have an index:

create index IX on [fctWMAUA] (SourceSystemKey, AsAtDateKey)
SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
FROM [fctWMAUA] (NOLOCK) AS [t0]
WHERE SourceSystemKey in (1,2,3,4,5,6,7,8,9)
GROUP BY [t0].[SourceSystemKey]

The statistics are as follows:

  • logical reads 1827978
  • physical reads 1113
  • read aheads 1806459

Taking that exact same query and reformatting it as follows gives me these statistics:

  • logical reads 36
  • physical reads 0
  • read aheads 0

It takes 31ms to execute.

SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
 FROM [fctWMAUA] (NOLOCK) AS [t0]
 WHERE SourceSystemKey = 1
 GROUP BY [t0].[SourceSystemKey]
UNION
 SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
 FROM [fctWMAUA] (NOLOCK) AS [t0]
 WHERE SourceSystemKey = 2
 GROUP BY [t0].[SourceSystemKey]
UNION
 SELECT MAX([t0].[AsAtDateKey]) AS [Date], [t0].[SourceSystemKey] AS [SourceSystem]
 FROM [fctWMAUA] (NOLOCK) AS [t0]
 WHERE SourceSystemKey = 3
 GROUP BY [t0].[SourceSystemKey]
/* AND SO ON TO 9 */

How do I make an index that does the group by quickly?

I have found that the best solution is the following. It mimics the union version of the query, and runs very quickly.

40 logical reads, and an execution time of 3ms.

SELECT [t3].[value]
FROM [dimSourceSystem] AS [t0]
OUTER APPLY (
    SELECT MAX([t2].[value]) AS [value]
    FROM (
        SELECT [t1].[AsAtDateKey] AS [value], [t1].[SourceSystemKey]
        FROM [fctWMAUA] AS [t1]
        ) AS [t2]
    WHERE [t2].[SourceSystemKey] = ([t0].[SourceSystemKey])
    ) AS [t3]

Its difficult to say without looking at an execution plan, however you might want to try the following:

SELECT * FROM
(
    SELECT MAX(t0.AsAtDateKey) AS [Date], t0.SourceSystemKey AS SourceSystem
    FROM fctWMAUA (NOLOCK) AS t0
    GROUP BY t0.SourceSystemKey
)
WHERE SourceSystem in (1,2,3,4,5,6,7,8,9)

Its difficult to tell without looking at an execution plan, but I think that whats happening is that SQL server is not clever enough to realise that the WHERE clause specified is filtering out the groups, and does not have any effect on the records included for each group. As soon as SQL server realises this its free to use some more inteligent index lookups to work out the maximum values (which is whats happening in your second query)

Just a theory, but it might be worth a try.

Try to tell SQL Server to use the index:

...
FROM [fctWMAUA] (NOLOCK, INDEX(IX)) AS [t0]
...

Make sure the statistics for the table are up to date:

UPDATE STATISTICS [fctWMAUA]

For better answers, turn on the showplan for both queries:

SET SHOWPLAN_TEXT ON

and add the results to your question.

You can also write the query without a GROUP BY. For example, you can use an exclusive LEFT JOIN excluding rows with older dates:

select cur.SourceSystemKey, cur.date
from fctWMAUA cur
left join fctWMAUA next
    on next.SourceSystemKey = next.SourceSystemKey
    and next.date > cur.date
where next.SourceSystemKey is null
and cur.SourceSystemKey in (1,2,3,4,5,6,7,8,9)

This can be surprisingly fast, but I don't think it could beat the UNION.

Use HAVING instead of WHERE, so that the filtering happens AFTER grouping has occurred:

SELECT MAX(AsAtDateKey) AS [Date], SourceSystemKey AS SourceSystem
FROM fctWMAUA (NOLOCK)
GROUP BY SourceSystemKey
HAVING SourceSystemKey in (1,2,3,4,5,6,7,8,9)

I also don't particularly care for the IN clause, especially when it could be replaced with "<10" or "Between 1 and 9", which are used better by sorted indexes.

 WHERE SourceSystemKey = 3
 GROUP BY [t0].[SourceSystemKey]

You don't need to group by a fixed field.

Any way I prefer the first sentence. May be I will replace the

 WHERE SourceSystemKey in (1,2,3,4,5,6,7,8,9)

for something like

 WHERE SourceSystemKey BETWEEN 1 AND 9

or

 WHERE SourceSystemKey >= 1 AND SourceSystemKey <= 9

if SourceSystemKey is an integer. But I don't think it will cause a big change.

What I will test first is rebuild statistics and rebuild all indexes for the table and wait some time. Rebuilding is not instant, it will depend on how busy is the server but this sentence is well structured for the index be used by the optimizer.

Regards.

Have you tried creating another index just on the SourceSystemKey column? The high number of logical reads when you use that column in your where clause makes me think it is doing an index/table scan. Could you run the execution plan on this and see if that's the case? The execution plan might come up with an index suggestion as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM