简体   繁体   English

LINQ和实体框架-避免子查询

[英]LINQ and Entity Framework - Avoiding subqueries

I'm having really hard time tuning up one of my Entity Framework generated queries in my application. 我真的很难在我的应用程序中调整由Entity Framework生成的查询之一。 It is very basic query but for some reason EF uses multiple inner subqueries which seem to perform horribly in DB instead of using joins. 这是一个非常基本的查询,但是由于某种原因, EF使用了多个内部子查询,这些子查询似乎在DB表现得很差,而不是使用联接。

Here's my LINQ code: 这是我的LINQ代码:

Projects.Select(proj => new ProjectViewModel()
                {
                    Name = proj.Name,
                    Id = proj.Id,
                    Total = proj.Subvalue.Where(subv =>
                        subv.Created >= startDate
                        && subv.Created <= endDate
                        &&
                        (subv.StatusId == 1 ||
                         subv.StatusId == 2))
                        .Select(c => c.SubValueSum)
                        .DefaultIfEmpty()
                        .Sum()
                })
                .OrderByDescending(c => c.Total)
                .Take(10);

EF generates really complex query with multiple subqueries which has awful query performance like this: EF生成具有多个子查询的非常复杂的查询,其查询性能非常差,如下所示:

SELECT TOP (10) 
[Project3].[Id] AS [Id], 
[Project3].[Name] AS [Name], 
[Project3].[C1] AS [C1]
FROM ( SELECT 
    [Project2].[Id] AS [Id], 
    [Project2].[Name] AS [Name], 
    [Project2].[C1] AS [C1]
    FROM ( SELECT 
        [Extent1].[Id] AS [Id], 
        [Extent1].[Name] AS [Name], 
        (SELECT 
            SUM([Join1].[A1]) AS [A1]
            FROM ( SELECT 
                CASE WHEN ([Project1].[C1] IS NULL) THEN cast(0 as decimal(18)) ELSE [Project1].[SubValueSum] END AS [A1]
                FROM   ( SELECT 1 AS X ) AS [SingleRowTable1]
                LEFT OUTER JOIN  (SELECT 
                    [Extent2].[SubValueSum] AS [SubValueSum], 
                    cast(1 as tinyint) AS [C1]
                    FROM [dbo].[Subvalue] AS [Extent2]
                    WHERE ([Extent1].[Id] = [Extent2].[Id]) AND ([Extent2].[Created] >= '2015-08-01') AND ([Extent2].[Created] <= '2015-10-01') AND ([Extent2].[StatusId] IN (1,2)) ) AS [Project1] ON 1 = 1
            )  AS [Join1]) AS [C1]
        FROM [dbo].[Project] AS [Extent1]
        WHERE ([Extent1].[ProjectCountryId] = 77) AND ([Extent1].[Active] = 1)
    )  AS [Project2]
)  AS [Project3]
ORDER BY [Project3].[C1] DESC;

The execution time of the query generated by EF is ~10 seconds . EF生成的查询的执行时间~10 seconds But when I write the query by hand like this: 但是当我这样手动编写查询时:

select 
    TOP (10)
    Proj.Id,
    Proj.Name,
    SUM(Subv.SubValueSum) AS Total
from 
    SubValue as Subv
left join
    Project as Proj on Proj.Id = Subv.ProjectId
where
    Subv.Created > '2015-08-01' AND Subv.Created <= '2015-10-01' AND Subv.StatusId IN (1,2)
group by
    Proj.Id,
    Proj.Name
order by 
    Total DESC

The execution time is near instant; 执行时间接近即时。 below 30ms . 30ms以下。

The problem clearly lies in my ability to write good EF queries with LINQ but no matter what I try to do (using Linqpad for testing) I just can't write similar performant query with LINQ\\EF as I can write by hand. 问题显然在于我用LINQ编写良好的EF查询的能力,但是无论我做什么(使用Linqpad进行测试),我都无法像用手工编写那样用LINQ\\EF编写类似的性能查询。 I've trie querying the SubValue table and Project table but the endcome is mostly the same: multiple ineffective nested subqueries instead of a single join doing the work. 我已经在查询SubValue表和Project表,但是结果基本上是相同的:多个无效嵌套子查询而不是单个联接即可完成工作。

How can I write a query which imitates the hand written SQL shown above? 如何编写一个模仿上面显示的手写SQL的查询? How can I control the actual query generated by EF ? 如何控制EF生成的实际查询? And most importantly: how can I get Linq2SQL and Entity Framework to use Joins when I want to instead of nested subqueries. 最重要的是:当我想代替嵌套子查询时,如何使Linq2SQLEntity Framework使用Joins

EF generates SQL from the LINQ expression you provide and you cannot expect this conversion to completely unravel the structure of whatever you put into the expression in order to optimize it. EF从您提供的LINQ表达式中生成SQL,并且您不能指望这种转换会完全弄清楚您在表达式中放入的内容的结构以便对其进行优化。 In your case you have created an expression tree that for each project will use a navigation property to sum some subvalues related to the project. 在您的情况下,您已经创建了一个表达式树,该表达式树将为每个项目使用导航属性来求和与该项目相关的某些子值。 This results in nested subqueries as you have discovered. 正如您所发现的,这将导致嵌套子查询。

To improve on the generated SQL you need to avoid navigating from project to subvalue before doing all the operations on subvalue and you can do this by creating a join (which is also what you do in you hand crafted SQL): 为了改进生成的SQL,您需要在对子值进行所有操作之前避免从项目导航到子值,并且可以通过创建联接来做到这一点(这也是您在手工SQL中所做的事情):

var query = from proj in context.Project
            join s in context.SubValue.Where(s => s.Created >= startDate && s.Created <= endDate && (s.StatusId == 1 || s.StatusId == 2)) on proj.Id equals s.ProjectId into s2
            from subv in s2.DefaultIfEmpty()
            select new { proj, subv } into x
            group x by new { x.proj.Id, x.proj.Name } into g
            select new {
              g.Key.Id,
              g.Key.Name,
              Total = g.Select(y => y.subv.SubValueSum).Sum()
            } into y
            orderby y.Total descending
            select y;
var result = query.Take(10);

The basic idea is to join projects on subvalues restricted by a where clause. 基本思想是在受where子句限制的子值上加入项目。 To perform a left join you need the DefaultIfEmpty() but you already know that. 要执行左DefaultIfEmpty()您需要DefaultIfEmpty()但您已经知道这一点。

The joined values ( x ) are then grouped and the summation of SubValueSum is performed in each group. 然后将SubValueSum值( x )分组,并在每个组中执行SubValueSum的求和。

Finally, ordering and TOP(10) is applied. 最后,应用排序和TOP(10)

The generated SQL still contains subqueries but I would expect it to more efficient compared to SQL generated by your query: 生成的SQL仍然包含子查询,但与查询生成的SQL相比,我希望它效率更高:

SELECT TOP (10)
    [Project1].[Id] AS [Id],
    [Project1].[Name] AS [Name],
    [Project1].[C1] AS [C1]
    FROM ( SELECT
        [GroupBy1].[A1] AS [C1],
        [GroupBy1].[K1] AS [Id],
        [GroupBy1].[K2] AS [Name]
        FROM ( SELECT
            [Extent1].[Id] AS [K1],
            [Extent1].[Name] AS [K2],
            SUM([Extent2].[SubValueSum]) AS [A1]
            FROM  [dbo].[Project] AS [Extent1]
            LEFT OUTER JOIN [dbo].[SubValue] AS [Extent2] ON ([Extent2].[Created] >= @p__linq__0) AND ([Extent2].[Created] <= @p__linq__1) AND ([Extent2].[StatusId] IN (1,2)) AND ([Extent1].[Id] = [Extent2].[ProjectId])
            GROUP BY [Extent1].[Id], [Extent1].[Name]
        )  AS [GroupBy1]
    )  AS [Project1]
    ORDER BY [Project1].[C1] DESC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM