简体   繁体   English

按类型划分Linq查询的性能

[英]Performance of Linq query by type

A discussion has come up at work: 讨论已经开始起作用:

We've got a class that has an IList. 我们有一个有IList的课程。 Fact is an abstract base class, and there are several concrete subclasses (PopulationFact, GdpFact, etc). 事实是一个抽象的基类,有几个具体的子类(PopulationFact,GdpFact等)。

Originally we'd query for a given fact in this fashion, ie by type: 最初我们以这种方式查询给定的事实,即按类型:

.Facts.FirstOrDefault(x => x.Year == 2011 && x is GdpFact)

Now, however, the question's been raised whether we should introduce a FactType enum instead to do 然而,现在提出的问题是我们是否应该引入FactType枚举来代替

.Facts.FirstOrDefault(x => x.Year == 2011 && x.FactType == FactType.Gdp)

The suggestion has been raised because it is supposedly faster. 提出这个建议是因为它应该更快。 I'll admit that I've not written any tests to try and discern the difference in performance, but I have two questions: 我承认我没有写任何测试来试图辨别性能上的差异,但我有两个问题:

1) Is 'querying on type' like this inherently bad? 1)这样的'查询类型'本身就是坏的吗?
2) Isn't adding a FactType enum just superfluous anyway, given that facts are strongly typed? 2)在事实​​是强类型的情况下,是不是添加FactType枚举只是多余的?

UPDATE To clarify, this is LINQ to objects and GdpFact:Fact. 更新为了澄清,这是LINQ to objects和GdpFact:Fact。

UPDATE 2 We've measured using current typical data (4 facts) and the results are in: 更新2我们使用当前的典型数据(4个事实)进行了测量,结果如下:

lookup on enum: 0.29660000000000003 milliseconds lookup on type: 0.24530000000000002 milliseconds 在枚举上查找:在类型上查找0.296600000000000000003毫秒:0.24530000000000002毫秒

So type lookup is faster in this context! 因此在这种情况下类型查找更快! I will choose my accepted answer carefully. 我会仔细选择我接受的答案。

All type of preformance related questions are strictly dependent on concrete application context , so may be answer(s) provided here will be partially right/wrong for your concrete case . 所有类型的性能相关问题都严格依赖于具体的应用程序上下文 ,因此,对于您的具体案例,这里提供的答案可能部分正确/错误。

Having this in mind: 考虑到这一点:

Checking enum value should be reasonably faster then checking for type, as in first case, you just check for equality 2 integers (enum values). 检查枚举值应该比检查类型要快得多,因为在第一种情况下,您只需检查相等的2整数(枚举值)。

But that introduce one more field in the object, that has to be tracked to have correct value (unit test), which you do not need in second case, as CLR cares about correct type initialization. 但是,在对象中引入了另外一个字段,必须跟踪它才能具有正确的值(单元测试),这在第二种情况下是不需要的,因为CLR关心正确的类型初始化。

I think it's better that you profile your ideas against relevant amount of data, that your app usually operates over, and will come out with correct idea for you . 我认为您最好根据相关数据量分析您的想法,您的应用程序通常会运行,并且会为您提供正确的想法。

I've done a test, my results for 1000000 iterations are approximately 我做了一个测试,我的1000000次迭代的结果大约是

ByCast 166ms
ByType 84ms
ByEnum 98ms

So the enum is in fact superfluous and slower but, not by much. 因此, enum实际上是多余的,而且速度较慢,但​​并不是很多。 This should not be too suprising, the type system is fundamental to the .Net Framework. 这不应该太令人惊讶,类型系统是.Net框架的基础。

Test code transcribed below, apologies for errata 测试代码转录如下,为勘误道歉

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

class Program
{
    private enum TypeOfFact
    {
        Gdp,
        Other
    }

    private abstract class Fact
    {
        public virtual int Year { get; set; }
        public abstract TypeOfFact FactType { get; }
    }

    private class GdpFact : Fact
    {
        public override TypeOfFact FactType
        {
            get { return TypeOfFact.Gdp; }
        }
    }

    private class OtherFact : Fact
    {
        public override TypeOfFact FactType
        {
            get { return TypeOfFact.Other; }
        }
    }

    static void Main()
    {
        Ilist<Fact> facts = new List<Fact>
            {
                new GdpFact { Year = 2010 },
                new OtherFact { Year = 2010 },
                new GdpFact { Year = 2009 },
                new OtherFact { Year = 2009 },
                new GdpFact { Year = 2011 },
                new OtherFact { Year = 2011 },
            };

        const int interations = 1000000;

        var funcs = new List<Func<IList<Fact>, Fact>>
            {
                ByList,
                ByType,
                ByEnum
            };

        // Warmup
        foreach (var func in funcs)
        {
           Measure(5, func, facts);
        }

        // Results
        foreach (var result in funcs.Select(f => new
            {
                Description = f.Method.Name,
                Ms = Measure(iterations, f, facts)
            }))
        {
            Console.WriteLine(
                "{0} time = {1}ms",
                result.Description,
                result.Ms);
        }
    }

    private static long Measure(
        int iterations,
        Func<IList<Fact>, Fact> func,
        IList<Fact> facts)
    {
        var stopwatch = new Stopwatch();
        stopwatch.Start();
        for (var i = 0; i < iterations; i++)
        {
            func.Invoke(facts);
        }

        stopwatch.Stop();
        return stopwatch.ElapsedMilliseconds;
    }

    private static Fact ByType(IList<Fact> facts)
    {
        return facts.FirstOrDefault(f =>
            f.Year == 2011 && f is GdpFact);
    }

    private static Fact ByEnum(IList<Fact> facts)
    {
        return facts.FirstOrDefault(f =>
            f.Year == 2011 && f.FactType == TypeOfFact.Gdp);
    }

    private static Fact ByCast(IList<Fact> facts)
    {
        return facts.OfType<GdpFact>()
            .FirstOrDefault(f => f.Year == 2011);
    }
}

This question seems relevant. 这个问题似乎很重要。

Is this maybe a solution looking for a problem? 这可能是寻找问题的解决方案吗?

I think having both concrete subtypes and an enum in the base type could potentially obfuscate your design. 我认为在基类型中同时使用具体的子类型和枚举可能会使您的设计模糊不清。 You could imagine someone coming along later and writing a new concrete class but not realising they needed to add to the enum as well... 你可以想象有人后来出现并写了一个新的具体课程,但没有意识到他们需要添加到枚举...

Unless you find you have a specific problem to do with performance, I'd be tempted to prioritise clarity instead. 除非你发现你有一个与性能有关的特定问题,否则我很想优先考虑清晰度。 Therefore if you need different concrete classes (and I'm assuming you do since that's how you've coded it to start with) then I'd stick with your types rather than move to an enum. 因此,如果你需要不同的具体类(并且我假设你这样做,那就是你开始编码它的方式)那么我会坚持你的类型而不是移动到枚举。

I think your original approach is fine. 我认为你原来的方法很好。 the 'is' keyword is provided for this purpose. 'is'关键字是为此目的而提供的。 MSDN does not discourage the use of 'is'. MSDN不鼓励使用'is'。 Using an enum seems to be over-engineering. 使用枚举似乎过度工程化。 We should try and keep code simple. 我们应该尽量保持代码简单。 Fewer the lines of code the better it is in most situations. 在大多数情况下,代码行越少越好。

It's possible that checking an enum value will be faster than a runtime type-check, but... 检查枚举值可能比运行时类型检查更快,但是......

  1. It's a micro-optimisation -- you're unlikely to notice much perf difference in real-world scenarios. 这是一个微观优化 - 你不太可能注意到真实世界场景中的差异。
  2. It makes things more complicated, and more complicated means more likely to break. 它使事情变得更复杂,更复杂意味着更容易破裂。
    For example, what's to stop you, or one of your colleagues, accidentally doing something like this? 例如,阻止你或你的一个同事意外地做这样的事情是什么?

     public class PopulationFact : Fact { public FactType FactType = FactType.GdpFact; // should be PopulationFact } 

I'd stick with the type-check. 我坚持使用类型检查。 There's actually a built-in LINQ method that'll do it for you: 实际上有一个内置的LINQ方法可以帮到你:

.Facts.FirstOrDefault(x => x.Year == 2011).OfType<GdpFact>()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM