简体   繁体   English

在Dictionary中使用IEqualityComparer与HashCode和Equals()的效率

[英]Efficiency of using IEqualityComparer in Dictionary vs HashCode and Equals()

The title is pretty much clear I think. 我认为标题非常清楚。

I was wondering if there's a certain efficiency overhead when using IEqualityComparer in a Dictionary<K,V> how does it all work when providing one? 我想知道在Dictionary<K,V>使用IEqualityComparer时是否存在一定的效率开销?提供一个时它是如何工作的?

Thanks 谢谢

Is it Faster? 它更快吗?

Coming from a gamedev perspective, if your key is a value type (struct, primitive, enum, etc.) providing your own EqualityComparer<T> is significantly faster - due to the fact the EqualityComparer<T>.Default boxes the value. 从gamedev的角度来看,如果你的键是一个值类型(struct,primitive,enum等),那么提供你自己的EqualityComparer<T>要快得多 - 因为EqualityComparer<T>.Default将这个值框起来。

As a real-world example, the Managed DirectX billboard sample used to run at ~30% of the speed of the C++ version; 作为一个真实的例子,Managed DirectX广告牌样本的运行速度大约是C ++版本的30%; where all the other samples were running at ~90%. 其他所有样品的运行率均在~90%左右。 The reason for this was that the billboards were being sorted using the default comparer (and thus being boxed), as it turns out 4MB of data was being copied around every frame thanks to this. 原因是广告牌使用默认比较器进行排序(因此被装箱),因为事实证明,每个帧周围都会复制4MB的数据。

How does it work? 它是如何工作的?

Dictionary<K,V> will provide EqualityComparer<T>.Default to itself via the default constructor. Dictionary<K,V>将通过默认构造函数向自身提供EqualityComparer<T>.Default What the default equality comparer does is (basically, notice how much boxing occurs): 默认的相等比较器的作用是什么(基本上,注意发生了多少拳击):

public void GetHashCode(T value)
{
   return ((object)value).GetHashCode();
}

public void Equals(T first, T second)
{
   return ((object)first).Equals((object)second);
}

Why would I ever use it? 我为什么要用它?

It's quite common to see this kind of code (when trying to have case-insensitive keys): 看到这种代码(尝试使用不区分大小写的键时)很常见:

var dict = new Dictionary<string, int>();
dict.Add(myParam.ToUpperInvariant(), fooParam);
// ...
var val = dict[myParam.ToUpperInvariant()];

This is really wasteful, it is better to just use a StringComparer on the constructor: 这真的很浪费,最好在构造函数上使用StringComparer:

var dict = new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);

Is it faster (redux)? 它更快(redux)?

In this specific scenario it is a lot faster, because ordinal string comparisons are the fastest type of string comparison you can do. 在这种特定情况下,它要快得多,因为序数字符串比较是您可以做的最快的字符串比较类型。 A quick benchmark: 快速基准:

static void Main(string[] args)
{
    var d1 = new Dictionary<string, int>();
    var d2 = new Dictionary<string, int>(StringComparer.OrdinalIgnoreCase);

    d1.Add("FOO", 1);
    d2.Add("FOO", 1);

    Stopwatch s = new Stopwatch();
    s.Start();
    RunTest1(d1, "foo");
    s.Stop();
    Console.WriteLine("ToUpperInvariant: {0}", s.Elapsed);

    s.Reset();
    s.Start();
    RunTest2(d2, "foo");
    s.Stop();
    Console.WriteLine("OrdinalIgnoreCase: {0}", s.Elapsed);

    Console.ReadLine();
}

static void RunTest1(Dictionary<string, int> values, string val)
{
    for (var i = 0; i < 10000000; i++)
    {
        values[val.ToUpperInvariant()] = values[val.ToUpperInvariant()];
    }
}

static void RunTest2(Dictionary<string, int> values, string val)
{
    for (var i = 0; i < 10000000; i++)
    {
        values[val] = values[val];
    }
}

// ToUpperInvariant: 00:00:04.5084119
// OrdinalIgnoreCase: 00:00:02.1211549
// 2x faster.

Reservations 预订

It is possible to eliminate the boxing overhead by implementing an interface on a struct (such as IEquatable<T> ). 通过在结构上实现接口(例如IEquatable<T> )可以消除装箱开销。 However, there are many surprising rules for when boxing occurs under these circumstances so I would recommend using the paired interface (eg IEqualityComparer<T> in this case) if at all possible. 但是,在这些情况下发生装箱时有许多令人惊讶的规则,因此我建议使用配对界面(例如,在这种情况下为IEqualityComparer<T> ),如果可能的话。

Jonathan has a great answer that points out how, using the right equality comparer improves the performance and Jon clarifies in his great answer that Dictionary<K, V> always uses an IEqualityComparer<T> which is EqualityComparer<T>.Default unless you specify another. Jonathan有一个很好的答案 ,指出如何使用正确的相等比较器改善性能,Jon在他的好答案中澄清 Dictionary<K, V>总是使用IEqualityComparer<T> ,即EqualityComparer<T>.Default除非你指定另一个。

The thing I'd like to touch upon is the role of IEquatable<T> interface when you use the default equality comparer. 我想IEquatable<T>是当你使用默认的相等比较器时IEquatable<T>接口的作用。

When you call the EqualityComparer<T>.Default , it uses a cached comparer if there is one. 当您调用EqualityComparer<T>.Default ,它会使用缓存的比较器(如果有)。 If it's the first time you're using the default equality comparer for that type, it calls a method called CreateComparer and caches the result for later use. 如果这是您第一次使用该类型的默认相等比较器,它会调用一个名为CreateComparer的方法并将结果缓存以供以后使用。 Here is the trimmed and simplified implementation of CreateComparer in .NET 4.5: 以下是.NET 4.5中CreateComparer的修剪和简化实现:

var t = (RuntimeType)typeof(T);

// If T is byte,
// return a ByteEqualityComparer.

// If T implements IEquatable<T>,
if (typeof(IEquatable<T>).IsAssignableFrom(t))
    return (EqualityComparer<T>)
           RuntimeTypeHandle.CreateInstanceForAnotherGenericParameter(
               (RuntimeType)typeof(GenericEqualityComparer<int>), t);

// If T is a Nullable<U> where U implements IEquatable<U>,
// return a NullableEqualityComparer<U>

// If T is an int-based Enum,
// return an EnumEqualityComparer<T>

// Otherwise return an ObjectEqualityComparer<T>

But what does it mean for types that implement IEquatable<T> ? 但是对于实现IEquatable<T>类型意味着什么?
Here, the definition of GenericEqualityComparer<T> : 这里, GenericEqualityComparer<T>的定义:

internal class GenericEqualityComparer<T> : EqualityComparer<T>
    where T: IEquatable<T>
// ...

The magic happens in the generic type constraint ( where T : IEquatable<T> part) because using it does not involve boxing if T is a value type, no casting like (IEquatable<T>)T is happening here, which is the primary benefit of generics. 魔法发生在通用类型约束( where T : IEquatable<T>部分),因为使用它涉及拳击如果T为值类型,没有铸造等(IEquatable<T>)T发生在这里,这是主要的仿制药的好处。

So, let's say we want a dictionary that maps integers to strings. 所以,假设我们想要一个将整数映射到字符串的字典。
What happens if we initialize one using the default constructor? 如果我们使用默认构造函数初始化一个会发生什么?

var dict = new Dictionary<int, string>();
  • We know that a dictionary uses EqualityComparer<T>.Default unless we specify another. 我们知道除非我们指定另一个字典,否则字典使用EqualityComparer<T>.Default
  • We know that EqualityComparer<int>.Default will check if int implements IEquatable<int> . 我们知道EqualityComparer<int>.Default将检查int是否实现了IEquatable<int>
  • We know that int ( Int32 ) implements IEquatable<Int32> . 我们知道intInt32 )实现了IEquatable<Int32>

First call to EqualityComparer<T>.Default will create and cache a generic comparer which may take a little but when initialized, it's a strongly typed GenericEqualityComparer<T> and using it will cause no boxing or unnecessary overhead whatsoever. 首先调用EqualityComparer<T>.Default将创建并缓存一个通用的比较器,这可能需要一点点但是在初始化时,它是一个强类型的GenericEqualityComparer<T>并且使用它将不会导致装箱或不必要的开销。

And all the subsequent calls to EqualityComparer<T>.Default will return the cached comparer, which means the overhead of initialization is one-time only for each type. 并且对EqualityComparer<T>.Default所有后续调用将返回缓存的比较器,这意味着初始化的开销仅对于每种类型是一次性的。


So what does it all mean? 那么这一切意味着什么呢?

  • Do implement a custom equality comparer if T does not implement IEquatable<T> or its implementation of IEquatable<T> does not do what you want it to do. 如果T没有实现IEquatable<T> 或者它的IEquatable<T>实现没有按照你想要它做的那样,那么实现一个自定义相等比较器。
    (ie obj1.Equals(obj2) doesn`t give you the desired result.) (即obj1.Equals(obj2)给你想要的结果。)

Using of StringComparer in Jonathan's answer is a great example why you would specify a custom equality comparer. 在Jonathan的回答中使用StringComparer是一个很好的例子,可以指定自定义相等比较器。

  • Do not implement a custom equality comparer for the sake of performance if T implements IEquatable<T> and the implementation of IEquatable<T> does what you want it to do. 如果不实施为性能而定制的相等比较T实现IEquatable<T> 实施IEquatable<T>做你想要它做的事情。
    (ie obj1.Equals(obj2) gives you the desired result). (即obj1.Equals(obj2)为您提供所需的结果)。

In the latter case, use EqualityComparer<T>.Default instead. 在后一种情况下,请改用EqualityComparer<T>.Default

Dictionary<,> always uses an IEqualityComparer<TKey> - if you don't pass one, it uses EqualityComparer<T>.Default . Dictionary<,> 总是使用IEqualityComparer<TKey> - 如果你没有传递一个,它使用EqualityComparer<T>.Default So the efficiency will depend on how efficient your implementation is compared with EqualityComparer<T>.Default (which just delegates to Equals and GetHashCode ). 因此,效率将取决于您的实现与EqualityComparer<T>.Default (仅委托给EqualsGetHashCode )相比的效率。

I faced huge trouble to make an identical EqualityComparer ... critical section was GetHashCode which was generating duplicate key when targeting object[] and records are more then 20k .. below is the solution 我在制作一个相同的EqualityComparer遇到了很大的麻烦...关键部分是GetHashCode ,当目标object[]并且记录超过20k时生成重复键。下面是解决方案

public class ObJectArrayEqualityComparer : IEqualityComparer<object[]>
{ 
    public bool Equals(object[] x, object[] y)
    {
        if (x.Length != y.Length)
        {
            return false;
        }
        for (int i = 0; i < x.Length; i++)
        {
            var tempX = x[i];
            var tempY = y[i];
            if ((tempX==null || tempX ==DBNull.Value) 
                && (tempY == null || tempY == DBNull.Value))
            {
                return true;
            }

            if (!tempX.Equals(tempY) 
                && !System.Collections.StructuralComparisons.StructuralEqualityComparer.Equals(tempX, tempY))
            {
                return false;
            }
        }
        return true;
    }

    public int GetHashCode(object[] obj)
    {
        if (obj.Length == 1)
        {
            return obj[0].GetHashCode();
        }

        int result = 0;

        for (int i = 0; i < obj.Length; i++)
        {
            result = result + (obj[i].GetHashCode() * (65 + i));
        }

        return result;
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM