简体   繁体   English

关于如何正确覆盖object.GetHashCode()的一般建议和指南

[英]General advice and guidelines on how to properly override object.GetHashCode()

According to MSDN , a hash function must have the following properties: 根据MSDN ,散列函数必须具有以下属性:

  1. If two objects compare as equal, the GetHashCode method for each object must return the same value. 如果两个对象比较相等,则每个对象的GetHashCode方法必须返回相同的值。 However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values. 但是,如果两个对象的比较不相等,则两个对象的GetHashCode方法不必返回不同的值。

  2. The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. 只要没有对对象状态的修改来确定对象的Equals方法的返回值,对象的GetHashCode方法必须始终返回相同的哈希代码。 Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again. 请注意,这仅适用于当前应用程序的执行,并且如果再次运行应用程序,则可以返回不同的哈希代码。

  3. For the best performance, a hash function must generate a random distribution for all input. 为获得最佳性能,哈希函数必须为所有输入生成随机分布。


I keep finding myself in the following scenario: I have created a class, implemented IEquatable<T> and overridden object.Equals(object) . 我一直在以下场景中找到自己:我创建了一个类,实现了IEquatable<T>并重写object.Equals(object) IEquatable<T> object.Equals(object) MSDN states that: MSDN声明:

Types that override Equals must also override GetHashCode ; 重写Equals的类型也必须覆盖GetHashCode; otherwise, Hashtable might not work correctly. 否则,Hashtable可能无法正常工作。

And then it usually stops up a bit for me. 然后它通常会为我停止一点。 Because, how do you properly override object.GetHashCode() ? 因为,你如何正确覆盖object.GetHashCode() Never really know where to start, and it seems to be a lot of pitfalls. 从来没有真正知道从哪里开始,这似乎是很多陷阱。

Here at StackOverflow, there are quite a few questions related to GetHashCode overriding, but most of them seems to be on quite particular cases and specific issues. 在StackOverflow中,有很多与GetHashCode重写相关的问题,但大多数问题似乎都是针对非常特殊的情况和具体问题。 So, therefore I would like to get a good compilation here. 因此,我想在这里得到一个很好的汇编。 An overview with general advice and guidelines. 概述与一般建议和指南。 What to do, what not to do, common pitfalls, where to start, etc. 该做什么,不该做什么,常见的陷阱,从哪里开始,等等。

I would like it to be especially directed at C#, but I would think it will work kind of the same way for other .NET languages as well(?). 我希望它特别针对C#,但我认为它对其他.NET语言也有同样的作用(?)。


I think maybe the best way is to create one answer per topic with a quick and short answer first (close to one-liner if at all possible), then maybe some more information and end with related questions, discussions, blog posts, etc., if there are any. 我想也许最好的方法是每个主题创建一个答案,首先是快速简短的答案(如果可能的话,尽可能接近单行),然后可能会有更多信息,并以相关问题,讨论,博客文章等结束。 ,如果有的话。 I can then create one post as the accepted answer (to get it on top) with just a "table of contents". 然后,我可以创建一个帖子作为接受的答案(将其置于顶部),只需一个“目录”。 Try to keep it short and concise. 尽量保持简洁明了。 And don't just link to other questions and blog posts. 而且不要只链接到其他问题和博客文章。 Try to take the essence of them and then rather link to source (especially since the source could disappear. Also, please try to edit and improve answers instead of created lots of very similar ones. 尝试采用它们的本质,然后链接到源(特别是因为源可能会消失。另外,请尝试编辑和改进答案,而不是创建许多非常相似的答案。

I am not a very good technical writer, but I will at least try to format answers so they look alike, create the table of contents, etc. I will also try to search up some of the related questions here at SO that answers parts of these and maybe pull out the essence of the ones I can manage. 我不是一个非常优秀的技术作家,但我至少会尝试格式化答案,使它们看起来很相似,创建目录等。我也会尝试在这里搜索一些相关的问题来回答部分问题。这些并且可能拉出我能管理的那些的本质。 But since I am not very stable on this topic, I will try to stay away for the most part :p 但由于我在这个主题上不是很稳定,所以我会尽量远离这个主题:p

Table of contents 目录


Things that I would like to be covered, but haven't been yet: 我希望涵盖的内容,但尚未完成:

  • How to create the integer (How to "convert" an object into an int wasn't very obvious to me anyways). 如何创建整数(如何将对象“转换”为int对我来说不是很明显)。
  • What fields to base the hash code upon. 基于哈希代码的字段。
    • If it should only be on immutable fields, what if there are only mutable ones? 如果它只应该在不可变字段上,那么如果只有可变字段呢?
  • How to generate a good random distribution. 如何生成一个好的随机分布。 (MSDN Property #3) (MSDN Property#3)
    • Part to this, seems to choose a good magic prime number (have seen 17, 23 and 397 been used), but how do you choose it, and what is it for exactly? 在这方面,似乎选择了一个很好的魔术素数(已经看过使用了17,23和397),但是你如何选择它,它究竟是什么呢?
  • How to make sure the hash code stays the same all through the object lifetime. 如何确保哈希代码在整个对象生存期内保持不变。 (MSDN Property #2) (MSDN Property#2)
    • Especially when the equality is based upon mutable fields. 特别是当相等性基于可变字段时。 (MSDN Property #1) (MSDN Property#1)
  • How to deal with fields that are complex types (not among the built-in C# types ). 如何处理复杂类型的字段(不在内置的C#类型中 )。
    • Complex objects and structs, arrays, collections, lists, dictionaries, generic types, etc. 复杂对象和结构,数组,集合,列表,字典,泛型类型等。
    • For example, even though the list or dictionary might be readonly, that doesn't mean the contents of it are. 例如,即使列表或字典可能只读,但这并不意味着它的内容。
  • How to deal with inherited classes. 如何处理继承的类。
    • Should you somehow incorporate base.GetHashCode() into your hash code? 你应该以某种方式将base.GetHashCode()合并到你的哈希代码中吗?
  • Could you technically just be lazy and return 0? 你在技术上可能只是懒惰并返回0吗? Would heavily break MSDN guideline number #3, but would at least make sure #1 and #2 were always true :P 将严重破坏MSDN准则号#3,但至少会确保#1和#2始终为真:P
  • Common pitfalls and gotchas. 常见的陷阱和陷阱。

What are those magic numbers often seen in GetHashCode implementations? 在GetHashCode实现中常见的那些神奇数字是什么?

They are prime numbers. 他们是素数。 Prime numbers are used for creating hash codes because prime number maximize the usage of the hash code space. 素数用于创建哈希码,因为素数最大化了哈希码空间的使用。

Specifically, start with the small prime number 3, and consider only the low-order nybbles of the results: 具体来说,从小素数3开始,只考虑结果的低阶nybbles

  • 3 * 1 = 3 = 3(mod 8) = 0011 3 * 1 = 3 = 3(mod 8)= 0011
  • 3 * 2 = 6 = 6(mod 8) = 1010 3 * 2 = 6 = 6(mod 8)= 1010
  • 3 * 3 = 9 = 1(mod 8) = 0001 3 * 3 = 9 = 1(mod 8)= 0001
  • 3 * 4 = 12 = 4(mod 8) = 1000 3 * 4 = 12 = 4(mod 8)= 1000
  • 3 * 5 = 15 = 7(mod 8) = 1111 3 * 5 = 15 = 7(mod 8)= 1111
  • 3 * 6 = 18 = 2(mod 8) = 0010 3 * 6 = 18 = 2(mod 8)= 0010
  • 3 * 7 = 21 = 5(mod 8) = 1001 3 * 7 = 21 = 5(mod 8)= 1001
  • 3 * 8 = 24 = 0(mod 8) = 0000 3 * 8 = 24 = 0(mod 8)= 0000
  • 3 * 9 = 27 = 3(mod 8) = 0011 3 * 9 = 27 = 3(mod 8)= 0011

And we start over. 我们重新开始。 But you'll notice that successive multiples of our prime generated every possible permutation of bits in our nybble before starting to repeat. 但是你会注意到,在开始重复之前,我们的素数的连续倍数在我们的nybble中生成了每个可能的位排列。 We can get the same effect with any prime number and any number of bits, which makes prime numbers optimal for generating near-random hash codes. 我们可以使用任何素数和任意数量的位获得相同的效果,这使得素数最适合生成近似随机哈希码。 The reason we usually see larger primes instead of small primes like 3 in the example above is that, for greater numbers of bits in our hash code, the results obtained from using a small prime are not even pseudo-random - they're simply an increasing sequence until an overflow is encountered. 我们通常在上面的例子中看到较大的素数而不是像3这样的小素数的原因是,对于哈希码中更大的比特数,使用小素数得到的结果甚至不是伪随机的 - 它们只是一个增加序列直到遇到溢出。 For optimal randomness, a prime number that results in overflow for fairly small coefficients should be used, unless you can guarantee that your coefficients will not be small. 为了获得最佳随机性,应使用导致相当小系数溢出的素数,除非您可以保证系数不会很小。

Related links: 相关链接:

查看Eric Lippert的GetHashCode指南和规则

You should override it whenever you have a meaningful measure of equality for objects of that type (ie you override Equals). 只要对该类型的对象有一个有意义的相等度量(即重写等于),就应该覆盖它。 If you knew the object wasn't going to be hashed for any reason you could leave it, but it's unlikely you could know this in advance. 如果你知道对象不会因为任何原因而被删除,你可以离开它,但你不可能提前知道这一点。

The hash should be based only on the properties of the object that are used to define equality since two objects that are considered equal should have the same hash code. 哈希应该仅基于用于定义相等性的对象的属性,因为被认为相等的两个对象应该具有相同的哈希码。 In general you would usually do something like: 一般来说,你通常会这样做:


public override int GetHashCode()
{
    int mc = //magic constant, usually some prime
    return mc * prop1.GetHashCode() * prop2.GetHashCode * ... * propN.GetHashCode();
}

I usually assume multiplying the values together will produce a fairly uniform distribution, assuming each property's hashcode function does the same, although this may well be wrong. 我通常假设将值相乘将产生相当均匀的分布,假设每个属性的哈希码函数都是相同的,尽管这可能是错误的。 Using this method, if the objects equality-defining properties change, then the hash code is also likely to change, which is acceptable given definition #2 in your question. 使用此方法,如果对象的相等定义属性发生更改,则哈希代码也可能会更改,这在您的问题中定义#2时是可接受的。 It also deals with all types in a uniform way. 它还以统一的方式处理所有类型。

You could return the same value for all instances, although this will make any algorithms that use hashing (such as dictionarys) very slow - essentially all instances will be hashed to the same bucket and lookup will then become O(n) instead of the expected O(1). 您可以为所有实例返回相同的值,但这会使任何使用散列的算法(例如dictionarys)非常慢 - 基本上所有实例都将被散列到同一个桶,然后查找将变为O(n)而不是预期O(1)。 This of course negates any benefits of using such structures for lookup. 这当然否定了使用这种结构进行查找的任何好处。

Why do I have to override object.GetHashCode() ? 为什么我必须覆盖object.GetHashCode()

Overriding this method is important because the following property must always remain true: 覆盖此方法很重要,因为以下属性必须始终保持为true:

If two objects compare as equal, the GetHashCode method for each object must return the same value. 如果两个对象比较相等,则每个对象的GetHashCode方法必须返回相同的值。

The reason, as stated by JaredPar in a blog post on implementing equality, is that 正如JaredPar在关于实现平等的博客文章中所说的那样,原因在于

Many classes use the hash code to classify an object. 许多类使用哈希代码对对象进行分类。 In particular hash tables and dictionaries tend to place objects in buckets based on their hash code. 特别是哈希表和字典倾向于根据哈希代码将对象放在存储桶中。 When checking if an object is already in the hash table it will first look for it in a bucket. 当检查对象是否已经在哈希表中时,它将首先在桶中查找它。 If two objects are equal but have different hash codes they may be put into different buckets and the dictionary would fail to lookup the object. 如果两个对象相等但具有不同的哈希码,则它们可能被放入不同的桶中,并且字典将无法查找该对象。

Related links: 相关链接:

A) You must override both Equals and GetHashCode if you want to employ value equality instead of the default reference equality. A)如果要使用值相等而不是默认引用相等,则必须覆盖Equals和GetHashCode。 With the later, two object references compare as equal if they both refer to the same object instance. 对于后者,如果它们都引用相同的对象实例,则两个对象引用相等。 With the former they compare as equal if their value is the same even if they refer to different objects. 如果它们的值相同,即使它们引用不同的对象,它们与前者相比也是相等的。 For example, you probably want to employ value equality for Date, Money, and Point objects. 例如,您可能希望为Date,Money和Point对象使用值相等。

B) In order to implement value equality you must override Equals and GetHashCode. B)为了实现值相等,您必须重写Equals和GetHashCode。 Both should depend on the fields of the object that encapsulate the value. 两者都应该取决于封装该值的对象的字段。 For example, Date.Year, Date.Month and Date.Day; 例如,Date.Year,Date.Month和Date.Day; or Money.Currency and Money.Amount; 或Money.Currency和Money.Amount; or Point.X, Point.Y and Point.Z. 或Point.X,Point.Y和Point.Z。 You should also consider overriding operator ==, operator !=, operator <, and operator >. 您还应该考虑重写operator ==,operator!=,operator <和operator>。

C) The hashcode doesn't have to stay constant all through the object lifetime. C)哈希码不必在整个对象生存期内保持不变。 However it must remain immutable while it participates as the key in a hash. 但是,当它作为哈希中的键参与时,它必须保持不可变。 From MSDN doco for Dictionary: "As long as an object is used as a key in the Dictionary<(Of <(TKey, TValue>)>), it must not change in any way that affects its hash value." 从MSDN doco for Dictionary:“只要一个对象被用作Dictionary <(Of <(TKey,TValue>)>)中的一个键,它就不能以任何影响其哈希值的方式改变。” If you must change the value of a key remove the entry from the dictionary, change the key value, and replace the entry. 如果必须更改密钥的值,请从字典中删除条目,更改密钥值,然后替换该条目。

D) IMO, you will simplify your life if your value objects are themselves immutable. D)IMO,如果你的价值对象本身是不可变的,你将简化你的生活。

When do I override object.GetHashCode() ? 我什么时候覆盖object.GetHashCode()

As MSDN states: 正如MSDN所述:

Types that override Equals must also override GetHashCode ; 重写Equals的类型也必须覆盖GetHashCode; otherwise, Hashtable might not work correctly. 否则,Hashtable可能无法正常工作。

Related links: 相关链接:

What fields to base the hash code upon? 基于哈希码的字段是什么? If it should only be on immutable fields, what if there are only mutable ones? 如果它只应该在不可变字段上,那么如果只有可变字段呢?

It doesn't need to be based only on immutable fields. 它不需要仅基于不可变字段。 I would base it on the fields that determine the outcome of the equals method. 我将它基于确定equals方法结果的字段。

How to make sure the hash code stays the same all through the object lifetime. 如何确保哈希代码在整个对象生存期内保持不变。 (MSDN Property #2) Especially when the equality is based upon mutable fields. (MSDN属性#2)特别是当相等性基于可变字段时。 (MSDN Property #1) (MSDN Property#1)

You seem to misunderstand Property #2. 你似乎误解了物业#2。 The hashcode doesn't need to stay the same thoughout the objects lifetime. 在对象生存期内,哈希码不需要保持不变。 It just needs to stay the same as long as the values that determine the outcome of the equals method are not changed. 只要确定equals方法结果的值不变,它就需要保持不变。 So logically, you base the hashcode on those values only. 因此,逻辑上,您只将哈希码基于这些值。 Then there shouldn't be a problem. 那应该不会有问题。

public override int GetHashCode()
{
    return IntProp1 ^ IntProp2 ^ StrProp3.GetHashCode() ^ StrProp4.GetHashCode ^ CustomClassProp.GetHashCode;
}

Do the same in the customClass's GetHasCode method. 在customClass的GetHasCode方法中执行相同的GetHasCode Works like a charm. 奇迹般有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM