简体   繁体   English

实体框架为分页查询生成效率低下的SQL

[英]Entity Framework generates inefficient SQL for paged query

I have a simple paged linq query against one entity: 我对一个实体有一个简单的分页linq查询:

var data = (from t in ctx.ObjectContext.Widgets
           where t.CampaignId == campaignId && 
                 t.CalendarEventId == calendarEventId
                 (t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
           select t);

data = data.OrderBy(t => t.Id);

if (page > 0)
{
    data = data.Skip(rows * (page - 1)).Take(rows);
}

var l = data.ToList(); 

I expected it to generate SQL similar to: 我希望它会生成类似于以下内容的SQL:

select top 50 * from Widgets w where CampaignId = xxx AND CalendarEventId = yyy AND (RecurringEventId IS NULL OR RecurringEventId = zzz) order by w.Id

When I run the above query in SSMS, it returns quickly (had to rebuild my indexes first). 当我在SSMS中运行以上查询时,它会快速返回(必须先重建索引)。

However, the generated SQL is different. 但是,生成的SQL是不同的。 It contains a nested query as shown below: 它包含一个嵌套查询,如下所示:

SELECT TOP (50) 
[Project1].[Id] AS [Id], 
[Project1].[CampaignId] AS [CampaignId]
<redacted>
FROM ( SELECT [Project1].[Id] AS [Id], 
[Project1].[CampaignId] AS [CampaignId], 
<redacted>, 
row_number() OVER (ORDER BY [Project1].[Id] ASC) AS [row_number]
    FROM ( SELECT 
        [Extent1].[Id] AS [Id], 
        [Extent1].[CampaignId] AS [CampaignId], 
        <redacted>
        FROM [dbo].[Widgets] AS [Extent1]
        WHERE ([Extent1].[CampaignId] = @p__linq__0) AND ([Extent1].[CalendarEventId] = @p__linq__1) AND ([Extent1].[RecurringEventId] = @p__linq__2 OR [Extent1].[RecurringEventId] IS NULL)
    )  AS [Project1]
)  AS [Project1]
WHERE [Project1].[row_number] > 0
ORDER BY [Project1].[Id] ASC

The Widgets table is enormous and the inner query returns 100000s of records, causing a timeout. Widgets表非常庞大,内部查询返回100000s条记录,从而导致超时。

Is there anything I can do to change the generation? 我可以做些什么来改变世代吗? Anything I am doing wrong? 我做错了什么?

UPDATE 更新

I finally managed to refactor my code to return the results relatively quickly: 我终于设法重构我的代码以相对快速地返回结果:

var data = (from t in ctx.ObjectContext.Widgets
           where t.CampaignId == campaignId && 
                 t.CalendarEventId == calendarEventId
                 (t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
           select t)).AsEnumerable().Select((item, index) => new { Index = index, Item = item });

            data = data.OrderBy(t => t.Index);

            if (page > 0)
            {
                data = data.Where(t => t.Index >= (rows * (page - 1)));
            }

            data = data.Take(rows);

Note, the page > 0 logic is simply used to prevent an invalid parameter being used; 请注意, page > 0逻辑仅用于防止使用无效参数。 it does no optimization. 它没有优化。 In fact page > 1 , while valid, does not provide any noticeable optimization for the 1st page; 实际上, page > 1虽然有效,但对第一页没有提供任何明显的优化; since the Where is not a slow operation. 因为在Where操作不是很慢。

Prior SQL Server 2012, the generated SQL code is the best way to perform pagging. 在SQL Server 2012之前的版本中,生成的SQL代码是执行分页的最佳方法。 Yes, it is awfull and very inefficient but is the best you can do even writing your own SQL scritp by hand. 是的,它很糟糕并且效率很低,但是即使您手动编写自己的SQL scritp,也可以做到最好。 There are tons of digital ink about this in the net. 网上有大量关于此的数字墨水。 Just google it. 只是谷歌它。

In the firt page, this can be optimized not doing Skip and just Take but in any other page you are f***** up. 在firt页面中,可以优化此方法,而不是Skip和仅执行Take而在其他任何页面中都可以进行优化。

A workarround could be to generate your own row_number in persistence (an auto-identity could work) and just do where(widget.number > (page*rows) ).Take(rows) in code. 一个工作循环可能是在持久性上生成您自己的row_number(可以使用自动标识),然后执行where(widget.number > (page*rows) ).Take(rows) 。在代码中执行where(widget.number > (page*rows) ).Take(rows) If there is a good index in your widget.number the query should be very fast. 如果您的widget.number有良好的索引,则查询应该非常快。 But , this breaks the dynamic orderBy . 但是 ,这破坏了动态orderBy

However, I can see in your code that you are ordering by widget.id always; 但是,我可以在您的代码中看到您始终按widget.id排序; so, if dynamic orderBy is not essential, this could be a valid workaround. 因此,如果动态orderBy不是必需的,则这可能是有效的解决方法。

Will you take your own medicine? 你会自己吃药吗?

could you ask me. 你能问我吗。

No, I will not. 不我不会。 The best way to deal with this is having a persistence read-model in wich you can even have one table per widget orderBy field with its own widget.number . 解决此问题的最佳方法是拥有一个持久性读取模型,您甚至可以为每个小部件orderBy字段只有一个表及其自己的widget.number The problem is that modeling a system with a persistence read-model just for this issue is too crazy. 问题在于仅针对此问题使用持久性读取模型对系统进行建模太疯狂了。 Having a read-model is part of the overall design of your system and requires taking it in account from the very beginning of the design and development of a system. 拥有读取模型是系统总体设计的一部分,并且需要从系统设计和开发的开始就考虑到这一点。

The generated query is so complex and nested because you used Skip method. 生成的查询是如此复杂且嵌套,因为您使用了Skip方法。 In T-SQL Take is easy achievable by using just Top, but that is not the case with Skip - to apply it you need row_number and that is why there is a nested query - inner returns rows with row_number and outer filters them to get proper amount of rows. 在T-SQL中,仅使用Top即可轻松实现,但Skip并非如此-要应用它,您需要row_number,这就是为什么要使用嵌套查询的原因-内部返回具有row_number的行,而外部过滤它们以获得正确的行行数。 Your query: 您的查询:

select top 50 * from Widgets w where CampaignId = xxx AND CalendarEventId = yyy AND (RecurringEventId IS NULL OR RecurringEventId = zzz) order by w.Id

lacks Skipping initial rows. 缺少跳过初始行。 To keep the query very efficient it would be best to, instead of using Take and Skip to keep paging by condition on Id, because you are ordering your rows for paging basing on that field: 为了使查询非常高效,最好不要使用Take and Skip来按ID保持条件分页,因为您要基于该字段对要分页的行进行排序:

var data = (from t in ctx.ObjectContext.Widgets
       where t.CampaignId == campaignId && 
             t.CalendarEventId == calendarEventId
             (t.RecurringEventId IS NULL OR t.RecurringEventId = recurringEventId)
       select t);

data = data
    .OrderBy(t => t.Id);
    .Where(t => t.Id >= rows * (page - 1) && t.Id < rows * page )
    .ToList();

AFAIK you cannot change query generated by Entity. AFAIK,您无法更改由实体生成的查询。 Although you can force entity to run raw SQL query: 尽管您可以强制实体运行原始SQL查询:

https://msdn.microsoft.com/en-us/data/jj592907.aspx https://msdn.microsoft.com/zh-CN/data/jj592907.aspx

You can also use stored procedures: 您还可以使用存储过程:

https://msdn.microsoft.com/en-us/data/gg699321.aspx https://msdn.microsoft.com/zh-CN/data/gg699321.aspx

Even if there's a chance to change generated query IMO it would be spitting into the wind. 即使有机会更改生成的查询IMO,它也会随处可见。 I bet that easier will be to write the SQL query on your own. 我敢打赌,将更容易自己编写SQL查询。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM