简体   繁体   English

我需要加快以下Linq查询

[英]I need to speed up the following Linq query

I have an old stored proc I am rewriting into an EF Linq query however the proc is almost 3 times faster! 我有一个旧的存储过程我正在重写为EF Linq查询但是proc几乎快了3倍!

here is an example of the the query syntax: 这是查询语法的示例:

public string GetStringByID(long ID)
    {
        return dataContext.Table2.FirstOrDefault(x => x.Table2ID == ID).Table1.StringValue;
    }

here is the sproc code I am using along with the method of calling it. 这是我正在使用的sproc代码以及调用它的方法。

sproc is: sproc是:

PROCEDURE [dbo].[MyQuickerProc]
@ID bigint
AS
BEGIN
SET NOCOUNT ON;

IF EXISTS(SELECT TOP 1 ID FROM Table2 WHERE Table2ID = @Id)
    BEGIN
        SELECT TOP 1 t1.StringValue
        FROM Table2  t2
            INNER JOIN Table1 t1 ON t1.Table1ID= Table2.Table1ID
        WHERE Table2ID = @ID
    END
ELSE
    BEGIN
        SELECT TOP 1 t1.StringValue
        FROM Table2 t2
            INNER JOIN Table1 t1 ON t1.Table1Id = Table2.Table1ID
        WHERE Table2ID IS NULL
    END

END

I call the proc like this: 我把这个叫做proc:

string myString = context.MyQuickerProc(127).FirstOrDefault();

I have used unit test and stop watch to discover that the Linq call takes 1.3 seconds and the sproc call takes 0.5 seconds, shockingly long! 我已经使用了单元测试并停止观察发现Linq呼叫需要1.3秒,而sproc呼叫需要0.5秒,令人震惊的长! I am investigating missing FK as we speak as I can only assume that is the reason these calls are taking so long. 我正在调查失踪的FK,因为我只能假设这就是这些电话花了这么长时间的原因。

In any case I need to speed up this Linq query and add the missing functionality that the sproc has and the current Linq query does not contain (the if/else logic). 无论如何,我需要加速这个Linq查询并添加sproc所缺少的功能,并且当前的Linq查询不包含(if / else逻辑)。

Any help on this would be much appreciated. 任何有关这方面的帮助将非常感激。 thanks in advance :) 提前致谢 :)

Step 1: Establish a business case 第1步:建立业务案例

The first thing we need to do is ask " How fast does it need to be? ", because if we don't know how fast it needs to be we can't know when we're done. 我们需要做的第一件事就是问“ 它需要多快? ”,因为如果我们不知道它需要多快,我们就不知道什么时候完成。 This isn't a technical decision, it's a business one. 这不是技术决定,而是业务决策。 You need a stakeholder-centric measure of "Fast Enough" to aim for, and you need to bear in mind that Fast Enough is fast enough. 您需要一个以利益相关者为中心的“快速足够”的衡量标准,并且您需要牢记快速足够快。 We aren't looking for "As Fast As Possible" unless there's a business reason for it. 我们不是在寻找“尽可能快”,除非有商业原因。 Even then, we're normally looking for "As Fast As Possible Within Budget". 即便如此,我们通常也在寻找“尽可能快地在预算范围内”。

Since you're my stakeholder, and you don't seem to be too upset about the performance of your stored procedure, let's use that as a benchmark! 由于您是我的利益相关者,并且您似乎对存储过程的性能不太感到沮丧,让我们将其作为基准!

Step 2: Measure 第2步:测量

The next thing we need to do is measure our system to see if we're Fast Enough. 接下来我们需要做的是测量我们的系统,看看我们是否足够快。

Thankfully you've already measured (though we'll talk more about this later). 谢天谢地你已经测量过了(虽然我们稍后会详细讨论)。 Your stored procedure runs in 0.5 seconds! 您的存储过程在0.5秒内运行! Is that Fast Enough? 这够快吗? Yes it is! 是的! Job done! 任务完成!

There is no justification for continuing to spend your time (and your boss' money) fixing something that isn't broken. 没有理由继续花时间(和老板的钱)修理一些没有破坏的东西。 You probably have something better to be doing, so go do that! 你可能有更好的事情去做,所以去做吧! :D :d


Still here? 还在? Ok then. 好吧。 I'm not on the clock, people are badmouthing tech I like, and optimising Entity Framework queries is fun . 我不是时间,人们喜欢糟糕的技术,优化实体框架查询很有趣 Challenge Accepted! 接受挑战!

Step 3: Inspect 第3步:检查

So what's going on? 发生什么了? Why is our query so slow? 为什么我们的查询这么慢?

To answer that question, I'm going to need to make some assumptions about your model:- 要回答这个问题,我需要对你的模型做一些假设: -

public class Foo
{
    public int Id { get; set; }

    public int BarId { get; set; }

    public virtual Bar Bar { get; set; }
}

public class Bar
{
    public int Id { get; set; }

    public string Value { get; set; }

    public virtual ICollection<Foo> Foos { get; set; }
}

Now that we've done that, we can have a look at the horrible query that Entity Framework is making for us:- 既然我们已经这样做了,我们可以看一下Entity Framework为我们制作的可怕查询: -

using (var context = new FooContext())
{
    context.Database.Log = s => Console.WriteLine(s);

    var query = context.Foos.FirstOrDefault(x => x.Id == 1).Bar.Value;
}

I can see from the log that TWO queries are being run:- 我可以从日志中看到正在运行两个查询: -

SELECT TOP (1)
[Extent1].[Id] AS [Id],
[Extent1].[BarId] AS [BarId]
FROM [dbo].[Foos] AS [Extent1]
WHERE 1 = [Extent1].[Id]

SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Value] AS [Value]
FROM [dbo].[Bars] AS [Extent1]
WHERE [Extent1].[Id] = @EntityKeyValue1

Wait, what? 等等,什么? Why is stupid Entity Framework making two round-trips to the database when all we need is one string? 为什么当我们需要的是一个字符串时,愚蠢的实体框架会对数据库进行两次往返?

Step 4: Analyze 第4步:分析

Let's take a step back and look at our query again:- 让我们退一步再看看我们的查询: -

var query = context.Foos.FirstOrDefault(x => x.Id == 1).Bar.Value;

Given what we know about Deferred Execution what can we deduce is going on here? 鉴于我们对延期执行的了解,我们可以推断出什么?

What deferred execution basically means is that as long as you're working with an IQueryable , nothing actually happens - the query is built up in memory and not actually executed until later. 延迟执行基本上意味着只要你使用IQueryable ,实际上什么也没发生 - 查询是在内存中构建的,直到以后才真正执行。 This is useful for a number of reasons - in particular it lets us build up our queries in a modular fashion then run the composed query once. 这有很多原因 - 特别是它允许我们以模块化方式构建查询,然后运行组合查询一次。 Entity Framework would be pretty useless if context.Foos loaded the entire Foo table into memory immediately! 如果context.Foos立即将整个Foo表加载到内存中,实体框架将毫无用处!

Our queries only get run when we ask for something other than an IQueryable , eg with .AsEnumerable() , .ToList() , or especially .GetEnumerator() etc. In this case .FirstOrDefault() doesn't return an IQueryable , so this triggers the database call much earlier than we presumably intended. 我们的查询仅在我们请求IQueryable之外的其他内容时运行,例如使用.AsEnumerable() .ToList()或特别是.GetEnumerator()等。在这种情况下.FirstOrDefault()不返回IQueryable ,所以这比我们想要的更早地触发数据库调用。

The query we've made is basically saying:- 我们所做的查询基本上是说: -

  • Get the first Foo with Id == 1 (or null if there aren't any) 获取Id == 1的第一个Foo (如果没有,则返回null
  • Now Lazy Load that Foo 's Bar 现在Lazy Load那个Foo 's Bar
  • Now tell me that Bar 's Value 现在告诉我BarValue

Wow! 哇! So not only are we making two round-trips to the database, we're also sending the entire Foo and Bar across the wire! 因此,我们不仅要对数据库进行两次往返,我们还要通过线路发送整个FooBar That's not so bad when our entities are tiny like the contrived ones here, but what if they were larger realistic ones? 当我们的实体像这里的人为实体一样微小时,这并不是那么糟糕,但如果它们是更大的现实实体呢?

Step 5: Optimize 第5步:优化

As you've hopefully gleaned from the above, the first two rules of optimisation are 1) " Don't " and 2) " Measure first " The third rule of optimisation is " Avoid unnecessary work ". 正如你希望从上面得到的,前两个优化规则是1)“ 不要 ”和2)“ 先测量 ”优化的第三个规则是“ 避免不必要的工作 ”。 An extra round-trip and a whole bunch of spurious data definitely counts as "unnecessary", so let's do something about that:- 额外的往返和一大堆虚假数据肯定算是“不必要的”,所以让我们做点什么: -

Attempt 1 尝试1

The first thing we want to do is try the declarative approach. 我们要做的第一件事是尝试声明式方法。 "Find me the value of the first Bar that has a Foo with Id == 1 ". “找到第一个有Id == 1Foo Bar的值。”

This is usually the clearest option from a maintainability point of view; 从可维护性的角度来看,这通常是最明智的选择; the intent of the programmer is obviously captured. 程序员的意图显然是被捕获的。 However, remembering that we want to delay execution as long as possible, let's pop the .FirstOrDefault() after the .Select() :- 但是,记住我们想要尽可能延迟执行,让我们在.Select()之后弹出.FirstOrDefault() .Select() : -

var query = context.Bars.Where(x => x.Foos.Any(y => y.Id == 1))
                        .Select(x => x.Value)
                        .FirstOrDefault();

SELECT TOP (1)
[Extent1].[Value] AS [Value]
FROM [dbo].[Bars] AS [Extent1]
WHERE  EXISTS (SELECT
    1 AS [C1]
    FROM [dbo].[Foos] AS [Extent2]
    WHERE ([Extent1].[Id] = [Extent2].[BarId]) AND (1 = [Extent2].[Id])
)

Attempt 2 尝试2

In both SQL and most O/RMs, a useful trick is to make sure you're querying from the correct "end" of any given relationship. 在SQL和大多数O / RM中,一个有用的技巧是确保从任何给定关系的正确“结束”查询。 Sure, we're looking for a Bar , but we've got the Id of a Foo , so we can rewrite the query with that as a starting point: "Find me the Value of the Bar of the Foo with Id == 1 ":- 当然,我们正在寻找一个Bar ,但我们已经得到FooId ,所以我们可以用这个作为起点重写查询:“找到我的Foo BarValueId == 1 “: -

var query = context.Foos.Where(x => x.Id == 1)
                        .Select(x => x.Bar.Value)
                        .FirstOrDefault();

SELECT TOP (1)
[Extent2].[Value] AS [Value]
FROM  [dbo].[Foos] AS [Extent1]
INNER JOIN [dbo].[Bars] AS [Extent2] ON [Extent1].[BarId] = [Extent2].[Id]
WHERE 1 = [Extent1].[Id]

Much better. 好多了。 Prima Facie these look preferable to both the original Entity-Framework-generated mess and the original stored procedure. Prima Facie看起来比原始的Entity-Framework生成的混乱原始存储过程都要好。 Done! 完成!

Step 6: Measure 第6步:测量

No! 没有! Just wait a minute! 等一下! How do we know if we're Fast Enough? 我们怎么知道我们是否足够快? How do we even know if we're faster? 我们怎么知道我们是否更快?

We measure! 我们衡量!

And unfortunately you'll have to do this bit on your own. 不幸的是,你必须自己做这一点。 I can tell you that on my machine, on my network, simulating a realistic load for my application, the INNER JOIN is the fastest, followed by the two round-trips version (!!) , followed by the WHERE EXISTS version, followed by the stored procedure. 我可以告诉你,在我的机器上,在我的网络上,模拟我的应用程序的实际负载, INNER JOIN是最快的,然后是两个往返版本(!!) ,接着是WHERE EXISTS版本,然后是存储过程。 I can't tell you which will be fastest on your hardware, on your network, under a realistic load for your application . 我不能告诉你在你的应用程序 的实际负载下 你的硬件, 网络 上哪个最快

I can tell you that I've made this exact performance optimization over a dozen times and depending on the characteristics of the network, database server, and schema I've seen all three of INNER JOIN , WHERE EXISTS , and two round-trips give the best performance. 可以告诉你,我已经进行了十几次这种精确的性能优化,并且取决于网络,数据库服务器和模式的特性,我已经看到了INNER JOINWHERE EXISTS和两次往返的全部三个最好的表现。

However, I can't even tell you if any of these are Fast Enough . 但是, 我甚至不能告诉你 这些是否足够快 Depending on your needs you might need to hand-roll some hyper-optimized SQL and invoke a stored procedure. 根据您的需要,您可能需要手动滚动一些超级优化的SQL并调用存储过程。 You might even need to go further and use a denormalised read-optimized read store. 您甚至可能需要进一步使用非规范化读取优化读取存储。 What about using an in-memory cache for your database results? 为数据库结果使用内存缓存怎么样? What about using an output cache for your webserver? 如何为您的网络服务器使用输出缓存? What if this query isn't even the bottleneck? 如果这个查询甚至不是瓶颈怎么办?

Good performance isn't about speeding up Entity Framework queries. 良好的性能不是关于加速实体框架查询。 Good performance, like just about anything in our industry, is about knowing what's important to your customer, and figuring out the best way to get it. 良好的性能,就像我们行业中的任何事情一样,是关于了解对客户重要的事情,并找出获得它的最佳方式。

The first thing I would recommend doing is calling ToString() on your linq queries to see the SQL being generated. 我建议做的第一件事是在你的linq查询上调用ToString()来查看生成的SQL。 Based on your query and your configuration, it is possible you are making two trips to the database, Once to get Table2, then again to get the associated Table1 entity via lazy loading. 根据您的查询和配置,您可能会两次访问数据库,一次获取Table2,然后再次通过延迟加载获取关联的Table1实体。 You should try to verify if this is the case either with SQL profiler or stepping through with the debugger. 您应该尝试使用SQL事件探查器验证是否是这种情况,或者逐步调试调试器。 See if rewriting your query like the following adds any performance enhancement which eagerly loads the related entity: 看看是否重写您的查询,如下所示添加了任何急切加载相关实体的性能增强:

var result = dataContext.Table2.
             .include("Table1")
             .FirstOrDefault(x => x.Table2ID == ID);

if(result != null){
    return result..Table1.StringValue;
}else{....}

Notice I also added in some logic checking if result is null. 注意我还添加了一些逻辑检查结果是否为null。 You are using FirstOrDefault, which will cause .Table1 to throw an exception if the result is not found. 您正在使用FirstOrDefault,如果找不到结果,将导致.Table1抛出异常。 I would either change the call to First() if you never expect the result to be null, or handle the null case. 如果您从未期望结果为null,或者处理null case,我会将调用更改为First()。

Another thing you should look at is how EF is configured to match against a NULL case, that could slow down your query. 您应该关注的另一件事是如何配置EF以匹配NULL情况,这可能会减慢您的查询速度。 Check out this post (not to link to my own post, but its relevant): EntityFramework LINQToEntities generate weird slow TSQL Where-Clause 看看这篇文章(不是链接到我自己的帖子,但它的相关): EntityFramework LINQToEntities生成奇怪的慢TSQL Where-Clause

This should yield the correct result but I can not tell how efficient it is; 这应该产生正确的结果,但我不知道它有多高效; you will have to profile it. 你将不得不剖析它。 Note that the query will really only fetch a single string from the database and not require any client side processing by the Entity Framework. 请注意,查询实际上只会从数据库中获取单个字符串,而不需要实体框架进行任何客户端处理。

dataContext.Table2
           .Where(x => (x.Table2ID == ID) || (x.Table2ID == null))
           .OrderByDescending(x => x.Table2ID) // This will place ID before NULL.
           .Select(x => x.Table1.StringValue)
           .First()

Using LINQPad I got more or less the expected SQL statement but I did not try if the Entity Framework will produce the same query. 使用LINQPad我或多或少得到了预期的SQL语句,但我没有尝试实体框架是否会生成相同的查询。 But because this is a single query there is even a slight chance that the Entity Framework can outperform the stored procedure with its conditional second query but obviously only because of the reformulated query. 但是因为这是一个单一的查询,实体框架甚至可能会通过条件化的第二个查询来超越存储过程,但显然只是因为重新构造的查询。

 SELECT TOP (1) [t1].[StringValue]
           FROM [Table2] AS [t2]
LEFT OUTER JOIN [Table1] AS [t1]
             ON [t1].[Table1ID] = [t2].[Table1ID]
          WHERE ([t2].[Table2ID] = @ID) OR ([t2].[Table2ID] IS NULL)
       ORDER BY [t2].[Table2ID] DESC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM