Entity Framework generates inefficient SQL for paged query

Question

I have a simple paged linq query against one entity:

var data = (from t in ctx.ObjectContext.Widgets
           where t.CampaignId == campaignId && 
                 t.CalendarEventId == calendarEventId
                 (t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
           select t);

data = data.OrderBy(t => t.Id);

if (page > 0)
{
    data = data.Skip(rows * (page - 1)).Take(rows);
}

var l = data.ToList();

I expected it to generate SQL similar to:

select top 50 * from Widgets w where CampaignId = xxx AND CalendarEventId = yyy AND (RecurringEventId IS NULL OR RecurringEventId = zzz) order by w.Id

When I run the above query in SSMS, it returns quickly (had to rebuild my indexes first).

However, the generated SQL is different. It contains a nested query as shown below:

SELECT TOP (50) 
[Project1].[Id] AS [Id], 
[Project1].[CampaignId] AS [CampaignId]
<redacted>
FROM ( SELECT [Project1].[Id] AS [Id], 
[Project1].[CampaignId] AS [CampaignId], 
<redacted>, 
row_number() OVER (ORDER BY [Project1].[Id] ASC) AS [row_number]
    FROM ( SELECT 
        [Extent1].[Id] AS [Id], 
        [Extent1].[CampaignId] AS [CampaignId], 
        <redacted>
        FROM [dbo].[Widgets] AS [Extent1]
        WHERE ([Extent1].[CampaignId] = @p__linq__0) AND ([Extent1].[CalendarEventId] = @p__linq__1) AND ([Extent1].[RecurringEventId] = @p__linq__2 OR [Extent1].[RecurringEventId] IS NULL)
    )  AS [Project1]
)  AS [Project1]
WHERE [Project1].[row_number] > 0
ORDER BY [Project1].[Id] ASC

The Widgets table is enormous and the inner query returns 100000s of records, causing a timeout.

Is there anything I can do to change the generation? Anything I am doing wrong?

UPDATE

I finally managed to refactor my code to return the results relatively quickly:

var data = (from t in ctx.ObjectContext.Widgets
           where t.CampaignId == campaignId && 
                 t.CalendarEventId == calendarEventId
                 (t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
           select t)).AsEnumerable().Select((item, index) => new { Index = index, Item = item });

            data = data.OrderBy(t => t.Index);

            if (page > 0)
            {
                data = data.Where(t => t.Index >= (rows * (page - 1)));
            }

            data = data.Take(rows);

Note, the page > 0 logic is simply used to prevent an invalid parameter being used; it does no optimization. In fact page > 1 , while valid, does not provide any noticeable optimization for the 1st page; since the Where is not a slow operation.

Answer 1

Prior SQL Server 2012, the generated SQL code is the best way to perform pagging. Yes, it is awfull and very inefficient but is the best you can do even writing your own SQL scritp by hand. There are tons of digital ink about this in the net. Just google it.

In the firt page, this can be optimized not doing Skip and just Take but in any other page you are f***** up.

A workarround could be to generate your own row_number in persistence (an auto-identity could work) and just do where(widget.number > (page*rows) ).Take(rows) in code. If there is a good index in your widget.number the query should be very fast. But , this breaks the dynamic orderBy .

However, I can see in your code that you are ordering by widget.id always; so, if dynamic orderBy is not essential, this could be a valid workaround.

Will you take your own medicine?

could you ask me.

No, I will not. The best way to deal with this is having a persistence read-model in wich you can even have one table per widget orderBy field with its own widget.number . The problem is that modeling a system with a persistence read-model just for this issue is too crazy. Having a read-model is part of the overall design of your system and requires taking it in account from the very beginning of the design and development of a system.

Answer 2

The generated query is so complex and nested because you used Skip method. In T-SQL Take is easy achievable by using just Top, but that is not the case with Skip - to apply it you need row_number and that is why there is a nested query - inner returns rows with row_number and outer filters them to get proper amount of rows. Your query:

select top 50 * from Widgets w where CampaignId = xxx AND CalendarEventId = yyy AND (RecurringEventId IS NULL OR RecurringEventId = zzz) order by w.Id

lacks Skipping initial rows. To keep the query very efficient it would be best to, instead of using Take and Skip to keep paging by condition on Id, because you are ordering your rows for paging basing on that field:

var data = (from t in ctx.ObjectContext.Widgets
       where t.CampaignId == campaignId && 
             t.CalendarEventId == calendarEventId
             (t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
       select t);

data = data
    .OrderBy(t => t.Id);
    .Where(t => t.Id >= rows * (page - 1) && t.Id < rows * page )
    .ToList();

Answer 3

AFAIK you cannot change query generated by Entity. Although you can force entity to run raw SQL query:

https://msdn.microsoft.com/en-us/data/jj592907.aspx

You can also use stored procedures:

https://msdn.microsoft.com/en-us/data/gg699321.aspx

Even if there's a chance to change generated query IMO it would be spitting into the wind. I bet that easier will be to write the SQL query on your own.

Entity Framework generates inefficient SQL for paged query

Question

3 answers

solution1
1 ACCPTED 2015-07-22 10:11:28

solution2
1 2015-07-22 11:21:13

solution3
0 2015-07-22 09:10:18

Entity Framework generates inefficient SQL for paged query

Question

3 answers

solution1 1 ACCPTED 2015-07-22 10:11:28

solution2 1 2015-07-22 11:21:13

solution3 0 2015-07-22 09:10:18

solution1
1 ACCPTED 2015-07-22 10:11:28

solution2
1 2015-07-22 11:21:13

solution3
0 2015-07-22 09:10:18