Why Does Select Top X Kill Performance

Question

I have the T-SQL Query below (generated by linq-to-sql, so it looks a bit weird.) When I remove the top 10, it runs in under 1 second. When I leave the top 10 in, it takes 68 seconds.

Without the top 10, the total number of rows is only 19.

Why does the top 10 destroy performance so much in this situation?

SELECT TOP (10) [t3].[value] AS [StartDate], [t3].[AH_TimeStamp4] AS [EndDate], [t3].[value2] AS [RunTime], [t3].[AH_Idnr] AS [RunNumber], [t3].[AH_Status] AS [StatusCode], [t3].[STATUS_DESC] AS [StatusDesc], [t3].[STATUS_TYPE] AS [StatusType]
FROM (
    SELECT COALESCE([t1].[AH_TimeStamp2],[t1].[AH_TimeStamp1]) AS [value], [t1].[AH_TimeStamp4], CONVERT(BigInt,(((CONVERT(BigInt,DATEDIFF(DAY, [t1].[AH_TimeStamp1], [t1].[AH_TimeStamp4]))) * 86400000) + DATEDIFF(MILLISECOND, DATEADD(DAY, DATEDIFF(DAY, [t1].[AH_TimeStamp1], [t1].[AH_TimeStamp4]), [t1].[AH_TimeStamp1]), [t1].[AH_TimeStamp4])) * 10000) AS [value2], [t1].[AH_Idnr], [t1].[AH_Status], [t2].[STATUS_DESC], [t2].[STATUS_TYPE], [t0].[OH_Name]
    FROM [dbo].[OH] AS [t0]
    INNER JOIN [dbo].[AH] AS [t1] ON ([t0].[OH_Idnr]) = [t1].[AH_OH_Idnr]
    INNER JOIN [dbo].[CHK_JOB_STATUS_CODE] AS [t2] ON [t1].[AH_Status] = ([t2].[STATUS_CODE])
    ) AS [t3]
WHERE [t3].[OH_Name] = @p0
ORDER BY [t3].[AH_Idnr] DESC

Edit -> as requested, here's the linq-generated query without the Take(10)

SELECT [t3].[value] AS [StartDate], [t3].[AH_TimeStamp4] AS [EndDate], [t3].[value2] AS [RunTime], [t3].[AH_Idnr] AS [RunNumber], [t3].[AH_Status] AS [StatusCode], [t3].[STATUS_DESC] AS [StatusDesc], [t3].[STATUS_TYPE] AS [StatusType]
FROM (
    SELECT COALESCE([t1].[AH_TimeStamp2],[t1].[AH_TimeStamp1]) AS [value], [t1].[AH_TimeStamp4], CONVERT(BigInt,(((CONVERT(BigInt,DATEDIFF(DAY, [t1].[AH_TimeStamp1], [t1].[AH_TimeStamp4]))) * 86400000) + DATEDIFF(MILLISECOND, DATEADD(DAY, DATEDIFF(DAY, [t1].[AH_TimeStamp1], [t1].[AH_TimeStamp4]), [t1].[AH_TimeStamp1]), [t1].[AH_TimeStamp4])) * 10000) AS [value2], [t1].[AH_Idnr], [t1].[AH_Status], [t2].[STATUS_DESC], [t2].[STATUS_TYPE], [t0].[OH_Name]
    FROM [dbo].[OH] AS [t0]
    INNER JOIN [dbo].[AH] AS [t1] ON ([t0].[OH_Idnr]) = [t1].[AH_OH_Idnr]
    INNER JOIN [dbo].[CHK_JOB_STATUS_CODE] AS [t2] ON [t1].[AH_Status] = ([t2].[STATUS_CODE])
    ) AS [t3]
WHERE [t3].[OH_Name] = @p0
ORDER BY [t3].[AH_Idnr] DESC

Edit 2 -> here's the LINQ Query - nothing special, just 3 joins

List<UC4Status> statusList = uc4DB.OHs.Where(o => o.OH_Name == jobName).Join(uc4DB.AHs, oh => oh.OH_Idnr, ah => ah.AH_OH_Idnr, (oh, ah) => ah)
                        .Join(uc4DB.CHK_JOB_STATUS_CODEs, ah => ah.AH_Status, job => job.STATUS_CODE, (a, s) =>
                            new UC4Status
                            {
                                StartDate = a.AH_TimeStamp2 ?? a.AH_TimeStamp1,
                                EndDate = a.AH_TimeStamp4,
                                RunTime = a.AH_TimeStamp4 - a.AH_TimeStamp1,
                                StatusType = s.STATUS_TYPE,
                                StatusDesc = s.STATUS_DESC,
                                StatusCode = a.AH_Status,
                                RunNumber = a.AH_Idnr
                            }).OrderByDescending(r => r.RunNumber).Take(maxResults).ToList();

Answer 1

More important than the number of rows returned are the number of rows in each table joined. Without access to the query plan it is hard to say what the issue is. But, the issue could likely be due to outdated or missing column statistics combined with a lack of good indexes causing the query optimizer make a very bad decision when top is used. For example, if the optimizer incorrectly makes the assumption that the top statement will only operate on a small number of rows in each joined table, it could select to use inner loop joins with table scans on each joined table -- which could be a horribly slow operation on a large number of records (but can work very efficiently when working with only few records). In the case without top , the optimizer could correctly assume that it will have to scan a large volume of records in the joined tables and use an approach that better suits such an operation, such as a hash match join from a table scan.

You can try updating statistics and adding other indexes to see if that helps.

Answer 2

What's the number of rows without the top 10? Are there any indexes implemented? I would guess it's due to the subquery joins of t0, t1, and t2... Can you try something like,

select top 10 * from (
  select t3.value....
  from ( 
    select coalesce... )
  where ... order by ... 
)

Why Does Select Top X Kill Performance

Question

2 answers

solution1
2 ACCPTED 2011-01-07 23:50:58

solution2
0 2011-01-07 23:15:42

Why Does Select Top X Kill Performance

Question

2 answers

solution1 2 ACCPTED 2011-01-07 23:50:58

solution2 0 2011-01-07 23:15:42

solution1
2 ACCPTED 2011-01-07 23:50:58

solution2
0 2011-01-07 23:15:42