简体   繁体   English

为什么在重写 Equals 方法时重写 GetHashCode 很重要?

[英]Why is it important to override GetHashCode when Equals method is overridden?

Given the following class给定以下课程

public class Foo
{
    public int FooId { get; set; }
    public string FooName { get; set; }

    public override bool Equals(object obj)
    {
        Foo fooItem = obj as Foo;

        if (fooItem == null) 
        {
           return false;
        }

        return fooItem.FooId == this.FooId;
    }

    public override int GetHashCode()
    {
        // Which is preferred?

        return base.GetHashCode();

        //return this.FooId.GetHashCode();
    }
}

I have overridden the Equals method because Foo represent a row for the Foo s table.我重写了Equals方法,因为Foo代表Foo表的一行。 Which is the preferred method for overriding the GetHashCode ?哪个是覆盖GetHashCode的首选方法?

Why is it important to override GetHashCode ?为什么重写GetHashCode很重要?

Yes, it is important if your item will be used as a key in a dictionary, or HashSet<T> , etc - since this is used (in the absence of a custom IEqualityComparer<T> ) to group items into buckets.是的,如果您的项目将用作字典或HashSet<T>等中的键,这一点很重要 - 因为它用于(在没有自定义IEqualityComparer<T>的情况下)将项目分组到存储桶中。 If the hash-code for two items does not match, they may never be considered equal ( Equals will simply never be called).如果两个项目的哈希码不匹配,它们可能永远不会被视为相等(永远不会调用Equals )。

The GetHashCode() method should reflect the Equals logic; GetHashCode()方法应该反映Equals逻辑; the rules are:规则是:

  • if two things are equal ( Equals(...) == true ) then they must return the same value for GetHashCode()如果两件事相等( Equals(...) == true ),那么它们必须GetHashCode()返回相同的值
  • if the GetHashCode() is equal, it is not necessary for them to be the same;如果GetHashCode()相等,则它们不必相同; this is a collision, and Equals will be called to see if it is a real equality or not.这是一个冲突,将调用Equals来查看它是否是真正的相等。

In this case, it looks like " return FooId; " is a suitable GetHashCode() implementation.在这种情况下,看起来“ return FooId; ”是一个合适的GetHashCode()实现。 If you are testing multiple properties, it is common to combine them using code like below, to reduce diagonal collisions (ie so that new Foo(3,5) has a different hash-code to new Foo(5,3) ):如果您正在测试多个属性,通常使用如下代码将它们组合起来,以减少对角线冲突(即new Foo(3,5)new Foo(5,3)具有不同的哈希码):

In modern frameworks, the HashCode type has methods to help you create a hashcode from multiple values;在现代框架中, HashCode类型具有帮助您从多个值创建哈希码的方法; on older frameworks, you'd need to go without, so something like:在较旧的框架上,您需要不使用,例如:

unchecked // only needed if you're compiling with arithmetic checks enabled
{ // (the default compiler behaviour is *disabled*, so most folks won't need this)
    int hash = 13;
    hash = (hash * 7) + field1.GetHashCode();
    hash = (hash * 7) + field2.GetHashCode();
    ...
    return hash;
}

Oh - for convenience, you might also consider providing == and != operators when overriding Equals and GetHashCode .哦 - 为方便起见,您还可以考虑在覆盖EqualsGetHashCode时提供==!=运算符。


A demonstration of what happens when you get this wrong is here .当你弄错时会发生什么的演示在这里

It's actually very hard to implement GetHashCode() correctly because, in addition to the rules Marc already mentioned, the hash code should not change during the lifetime of an object.正确实现GetHashCode()实际上非常困难,因为除了 Marc 已经提到的规则之外,哈希码在对象的生命周期内不应该改变。 Therefore the fields which are used to calculate the hash code must be immutable.因此,用于计算哈希码的字段必须是不可变的。

I finally found a solution to this problem when I was working with NHibernate.当我使用 NHibernate 时,我终于找到了解决这个问题的方法。 My approach is to calculate the hash code from the ID of the object.我的方法是根据对象的 ID 计算哈希码。 The ID can only be set though the constructor so if you want to change the ID, which is very unlikely, you have to create a new object which has a new ID and therefore a new hash code. ID 只能通过构造函数设置,因此如果您想更改 ID(这不太可能),您必须创建一个具有新 ID 的新对象,因此需要一个新的哈希码。 This approach works best with GUIDs because you can provide a parameterless constructor which randomly generates an ID.这种方法最适用于 GUID,因为您可以提供随机生成 ID 的无参数构造函数。

By overriding Equals you're basically stating that you know better how to compare two instances of a given type.通过覆盖 Equals,您基本上是在说明您更了解如何比较给定类型的两个实例。

Below you can see an example of how ReSharper writes a GetHashCode() function for you.您可以在下面看到 ReSharper 如何为您编写 GetHashCode() 函数的示例。 Note that this snippet is meant to be tweaked by the programmer:请注意,此代码段旨在由程序员进行调整:

public override int GetHashCode()
{
    unchecked
    {
        var result = 0;
        result = (result * 397) ^ m_someVar1;
        result = (result * 397) ^ m_someVar2;
        result = (result * 397) ^ m_someVar3;
        result = (result * 397) ^ m_someVar4;
        return result;
    }
}

As you can see it just tries to guess a good hash code based on all the fields in the class, but if you know your object's domain or value ranges you could still provide a better one.如您所见,它只是尝试根据类中的所有字段猜测一个好的哈希码,但如果您知道对象的域或值范围,您仍然可以提供更好的哈希码。

Please don´t forget to check the obj parameter against null when overriding Equals() .请不要忘记在覆盖Equals()时检查 obj 参数是否为null And also compare the type.并且还要比较类型。

public override bool Equals(object obj)
{
    Foo fooItem = obj as Foo;

    if (fooItem == null)
    {
       return false;
    }

    return fooItem.FooId == this.FooId;
}

The reason for this is: Equals must return false on comparison to null .这样做的原因是: Equals在与null比较时必须返回 false。 See also http://msdn.microsoft.com/en-us/library/bsc2ak47.aspx另请参阅http://msdn.microsoft.com/en-us/library/bsc2ak47.aspx

How about:怎么样:

public override int GetHashCode()
{
    return string.Format("{0}_{1}_{2}", prop1, prop2, prop3).GetHashCode();
}

Assuming performance is not an issue :)假设性能不是问题:)

As of .NET 4.7 the preferred method of overriding GetHashCode() is shown below..NET 4.7开始,覆盖GetHashCode()的首选方法如下所示。 If targeting older .NET versions, include the System.ValueTuple nuget package.如果针对较旧的 .NET 版本,请包含System.ValueTuple nuget包。

// C# 7.0+
public override int GetHashCode() => (FooId, FooName).GetHashCode();

In terms of performance, this method will outperform most composite hash code implementations.在性能方面,这种方法将优于大多数复合哈希码实现。 The ValueTuple is a struct so there won't be any garbage, and the underlying algorithm is as fast as it gets. ValueTuple是一个struct ,因此不会有任何垃圾,并且底层算法尽可能快。

Just to add on above answers:只是添加上面的答案:

If you don't override Equals then the default behavior is that references of the objects are compared.如果您不覆盖 Equals,则默认行为是比较对象的引用。 The same applies to hashcode - the default implmentation is typically based on a memory address of the reference.这同样适用于哈希码——默认实现通常基于引用的内存地址。 Because you did override Equals it means the correct behavior is to compare whatever you implemented on Equals and not the references, so you should do the same for the hashcode.因为您确实覆盖了 Equals 这意味着正确的行为是比较您在 Equals 上实现的任何内容而不是引用,因此您应该对哈希码执行相同的操作。

Clients of your class will expect the hashcode to have similar logic to the equals method, for example linq methods which use a IEqualityComparer first compare the hashcodes and only if they're equal they'll compare the Equals() method which might be more expensive to run, if we didn't implement hashcode, equal object will probably have different hashcodes (because they have different memory address) and will be determined wrongly as not equal (Equals() won't even hit).您的类的客户会期望哈希码与 equals 方法具有相似的逻辑,例如使用 IEqualityComparer 的 linq 方法首先比较哈希码,只有当它们相等时,他们才会比较可能更昂贵的 Equals() 方法要运行,如果我们没有实现 hashcode,equal 对象可能会有不同的 hashcode(因为它们有不同的内存地址)并且会被错误地确定为不相等(Equals() 甚至不会命中)。

In addition, except the problem that you might not be able to find your object if you used it in a dictionary (because it was inserted by one hashcode and when you look for it the default hashcode will probably be different and again the Equals() won't even be called, like Marc Gravell explains in his answer, you also introduce a violation of the dictionary or hashset concept which should not allow identical keys - you already declared that those objects are essentially the same when you overrode Equals so you don't want both of them as different keys on a data structure which suppose to have a unique key. But because they have a different hashcode the "same" key will be inserted as different one.此外,除了如果您在字典中使用它可能无法找到您的对象的问题(因为它是由一个哈希码插入的,当您查找它时,默认哈希码可能会不同,并且 Equals()甚至不会被调用,就像 Marc Gravell 在他的回答中解释的那样,您还引入了对字典或哈希集概念的违反,它不应该允许相同的键 - 您已经声明当您覆盖 Equals 时这些对象本质上是相同的,所以您不要不希望它们都作为假设具有唯一键的数据结构上的不同键。但是因为它们具有不同的哈希码,“相同”键将作为不同的键插入。

It is because the framework requires that two objects that are the same must have the same hashcode.这是因为框架要求两个相同的对象必须具有相同的哈希码。 If you override the equals method to do a special comparison of two objects and the two objects are considered the same by the method, then the hash code of the two objects must also be the same.如果重写 equals 方法对两个对象进行特殊比较,并且该方法认为这两个对象相同,那么这两个对象的哈希码也必须相同。 (Dictionaries and Hashtables rely on this principle). (字典和哈希表依赖于这个原则)。

We have two problems to cope with.我们有两个问题需要解决。

  1. You cannot provide a sensible GetHashCode() if any field in the object can be changed.如果可以更改对象中的任何字段,则无法提供合理的GetHashCode() Also often a object will NEVER be used in a collection that depends on GetHashCode() .通常,一个对象永远不会在依赖于GetHashCode()的集合中使用。 So the cost of implementing GetHashCode() is often not worth it, or it is not possible.所以实现GetHashCode()的成本往往是不值得的,或者是不可能的。

  2. If someone puts your object in a collection that calls GetHashCode() and you have overrided Equals() without also making GetHashCode() behave in a correct way, that person may spend days tracking down the problem.如果有人将您的对象放入调用GetHashCode()的集合中,并且您已经覆盖了Equals()而没有使GetHashCode()以正确的方式运行,那么该人可能会花费数天时间来追踪问题。

Therefore by default I do.因此,默认情况下我会这样做。

public class Foo
{
    public int FooId { get; set; }
    public string FooName { get; set; }

    public override bool Equals(object obj)
    {
        Foo fooItem = obj as Foo;

        if (fooItem == null)
        {
           return false;
        }

        return fooItem.FooId == this.FooId;
    }

    public override int GetHashCode()
    {
        // Some comment to explain if there is a real problem with providing GetHashCode() 
        // or if I just don't see a need for it for the given class
        throw new Exception("Sorry I don't know what GetHashCode should do for this class");
    }
}

Hash code is used for hash-based collections like Dictionary, Hashtable, HashSet etc. The purpose of this code is to very quickly pre-sort specific object by putting it into specific group (bucket).哈希代码用于基于哈希的集合,如 Dictionary、Hashtable、HashSet 等。此代码的目的是通过将特定对象放入特定组(桶)来非常快速地对其进行预排序。 This pre-sorting helps tremendously in finding this object when you need to retrieve it back from hash-collection because code has to search for your object in just one bucket instead of in all objects it contains.当您需要从哈希集合中检索该对象时,这种预排序非常有助于找到该对象,因为代码必须仅在一个存储桶中搜索您的对象,而不是在它包含的所有对象中搜索。 The better distribution of hash codes (better uniqueness) the faster retrieval.哈希码分布越好(唯一性越好),检索速度越快。 In ideal situation where each object has a unique hash code, finding it is an O(1) operation.在每个对象都有唯一哈希码的理想情况下,找到它是一个 O(1) 操作。 In most cases it approaches O(1).在大多数情况下,它接近 O(1)。

It's not necessarily important;这不一定重要; it depends on the size of your collections and your performance requirements and whether your class will be used in a library where you may not know the performance requirements.这取决于您的集合的大小和您的性能要求,以及您的类是否将用于您可能不知道性能要求的库中。 I frequently know my collection sizes are not very large and my time is more valuable than a few microseconds of performance gained by creating a perfect hash code;我经常知道我的集合大小不是很大,我的时间比通过创建完美的哈希码获得的几微秒性能更有价值; so (to get rid of the annoying warning by the compiler) I simply use:所以(摆脱编译器恼人的警告)我只是使用:

   public override int GetHashCode()
   {
      return base.GetHashCode();
   }

(Of course I could use a #pragma to turn off the warning as well but I prefer this way.) (当然我也可以使用#pragma 来关闭警告,但我更喜欢这种方式。)

When you are in the position that you do need the performance than all of the issues mentioned by others here apply, of course.当然,当您处于确实需要性能的位置时,其他人在这里提到的所有问题都适用。 Most important - otherwise you will get wrong results when retrieving items from a hash set or dictionary: the hash code must not vary with the life time of an object (more accurately, during the time whenever the hash code is needed, such as while being a key in a dictionary): for example, the following is wrong as Value is public and so can be changed externally to the class during the life time of the instance, so you must not use it as the basis for the hash code:最重要- 否则在从散列集或字典中检索项目时会得到错误的结果:散列码不得随对象的生命周期而变化(更准确地说,在需要散列码的时间段内,例如在字典中的键):例如,以下是错误的,因为 Value 是公共的,因此可以在实例的生命周期内从外部更改为类,因此您不能将其用作哈希码的基础:


   class A
   {
      public int Value;

      public override int GetHashCode()
      {
         return Value.GetHashCode(); //WRONG! Value is not constant during the instance's life time
      }
   }    

On the other hand, if Value can't be changed it's ok to use:另一方面,如果 Value 无法更改,则可以使用:


   class A
   {
      public readonly int Value;

      public override int GetHashCode()
      {
         return Value.GetHashCode(); //OK  Value is read-only and can't be changed during the instance's life time
      }
   }

You should always guarantee that if two objects are equal, as defined by Equals(), they should return the same hash code.您应该始终保证如果两个对象相等,如 Equals() 定义的那样,它们应该返回相同的哈希码。 As some of the other comments state, in theory this is not mandatory if the object will never be used in a hash based container like HashSet or Dictionary.正如其他一些评论所述,理论上,如果对象永远不会在基于哈希的容器(如 HashSet 或 Dictionary)中使用,则这不是强制性的。 I would advice you to always follow this rule though.不过,我建议您始终遵循此规则。 The reason is simply because it is way too easy for someone to change a collection from one type to another with the good intention of actually improving the performance or just conveying the code semantics in a better way.原因很简单,因为人们太容易将集合从一种类型更改为另一种类型,其目的是实际提高性能或只是以更好的方式传达代码语义。

For example, suppose we keep some objects in a List.例如,假设我们将一些对象保存在 List 中。 Sometime later someone actually realizes that a HashSet is a much better alternative because of the better search characteristics for example.一段时间后,有人真正意识到 HashSet 是一个更好的选择,因为它具有更好的搜索特性。 This is when we can get into trouble.这是我们可能遇到麻烦的时候。 List would internally use the default equality comparer for the type which means Equals in your case while HashSet makes use of GetHashCode(). List 将在内部使用类型的默认相等比较器,这意味着在您的情况下为 Equals,而 HashSet 使用 GetHashCode()。 If the two behave differently, so will your program.如果两者的行为不同,您的程序也会如此。 And bear in mind that such issues are not the easiest to troubleshoot.请记住,此类问题并不是最容易解决的问题。

I've summarized this behavior with some other GetHashCode() pitfalls in a blog post where you can find further examples and explanations.我在一篇文中总结了这种行为以及其他一些 GetHashCode() 陷阱,您可以在其中找到更多示例和解释。

C# 9 (.net 5 或 .net core 3.1)开始,您可能希望使用记录,因为它默认使用基于值的平等

It's my understanding that the original GetHashCode() returns the memory address of the object, so it's essential to override it if you wish to compare two different objects.我的理解是原始的 GetHashCode() 返回对象的内存地址,因此如果您想比较两个不同的对象,则必须覆盖它。

EDITED: That was incorrect, the original GetHashCode() method cannot assure the equality of 2 values.编辑:这是不正确的,原来的 GetHashCode() 方法不能保证 2 个值的相等性。 Though objects that are equal return the same hash code.尽管相等的对象返回相同的哈希码。

Below using reflection seems to me a better option considering public properties as with this you don't have have to worry about addition / removal of properties (although not so common scenario).在我看来,考虑到公共属性,下面使用反射是一个更好的选择,因为您不必担心添加/删除属性(尽管不是很常见的情况)。 This I found to be performing better also.(Compared time using Diagonistics stop watch).我发现这也表现得更好。(使用对角秒表比较时间)。

    public int getHashCode()
    {
        PropertyInfo[] theProperties = this.GetType().GetProperties();
        int hash = 31;
        foreach (PropertyInfo info in theProperties)
        {
            if (info != null)
            {
                var value = info.GetValue(this,null);
                if(value != null)
                unchecked
                {
                    hash = 29 * hash ^ value.GetHashCode();
                }
            }
        }
        return hash;  
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM