简体   繁体   English

Object.GetHashCode()对引用或值是唯一的吗?

[英]Is Object.GetHashCode() unique to a reference or a value?

The MSDN documentation on Object.GetHashCode() describes 3 contradicting rules for how the method should work. Object.GetHashCode()上的MSDN文档描述了该方法应该如何工作的3条矛盾规则。

  1. If two objects of the same type represent the same value, the hash function must return the same constant value for either object. 如果两个相同类型的对象表示相同的值,则哈希函数必须为任一对象返回相同的常量值。
  2. For the best performance, a hash function must generate a random distribution for all input. 为获得最佳性能,哈希函数必须为所有输入生成随机分布。
  3. The hash function must return exactly the same value regardless of any changes that are made to the object. 无论对对象做出任何更改,哈希函数都必须返回完全相同的值。

Rules 1 & 3 are contradictory to me. 规则1和3与我相矛盾。

Does Object.GetHashCode() return a unique number based on the value of an object, or the reference to the object. Object.GetHashCode()是否根据对象的或对象的引用返回唯一的数字。 If I override the method I can choose what to use, but I'd like to know what is used internally if anyone knows. 如果我覆盖方法,我可以选择使用什么,但我想知道内部使用的内容,如果有人知道的话。

Rules 1 & 3 are contradictory to me. 规则1和3与我相矛盾。

To a certain extent, they are. 在某种程度上,他们是。 The reason is simple: if an object is stored in a hash table and, by changing its value, you change its hash then the hash table has lost the value and you can't find it again by querying the hash table. 原因很简单:如果对象存储在哈希表中,并且通过更改其值,您更改其哈希值,则哈希表已丢失该值,并且您无法通过查询哈希表再次找到它。 It is important that while objects are stored in a hash table, they retain their hash value. 重要的是,当对象存储在哈希表中时,它们保留其哈希值。

To realize this it is often simplest to make hashable objects immutable, thus evading the whole problem. 为了实现这一点,通常最简单的方法是使可清洗对象不可变,从而避免整个问题。 It is however sufficient to make only those fields immutable that determine the hash value. 但是,只有那些确定哈希值的字段是不可变的就足够了。

Consider the following example: 请考虑以下示例:

struct Person {
    public readonly string FirstName;
    public readonly string Name;
    public readonly DateTime Birthday;

    public int ShoeSize;
}

People rarely change their birthday and most people never change their name (except when marrying). 人们很少改变他们的生日,大多数人从不改变他们的名字(除非结婚)。 However, their shoe size may grow arbitrarily, or even shrink. 然而,他们的鞋子尺寸可能会随意增长,甚至会缩小。 It is therefore reasonable to identify people using their birthday and name but not their shoe size. 因此,使用他们的生日和名字而不是他们的鞋子大小来识别人是合理的。 The hash value should reflect this: 哈希值应该反映这一点:

public int GetHashCode() {
    return FirstName.GetHashCode() ^ Name.GetHashCode() ^ Birthday.GetHashCode();
}

Not sure what MSDN documentation you are referring to. 不确定您所指的MSDN文档。 Looking at the current documentation on Object.GetHashCode ( http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx ) provides the following "rules": 查看Object.GetHashCode( http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx )上的当前文档提供了以下“规则”:

  • If two objects compare as equal, the GetHashCode method for each object must return the same value. 如果两个对象比较相等,则每个对象的GetHashCode方法必须返回相同的值。 However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values. 但是,如果两个对象的比较不相等,则两个对象的GetHashCode方法不必返回不同的值。

  • The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. 只要没有对对象状态的修改来确定对象的Equals方法的返回值,对象的GetHashCode方法必须始终返回相同的哈希代码。 Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again. 请注意,这仅适用于当前应用程序的执行,并且如果再次运行应用程序,则可以返回不同的哈希代码。

  • For the best performance, a hash function must generate a random distribution for all input. 为获得最佳性能,哈希函数必须为所有输入生成随机分布。

If you are referring to the second bullet point, the key phrases here are "as long as there is no modification to the object state" and "true only for the current execution of an application". 如果您指的是第二个项目符号点,则此处的关键短语是“只要对象状态没有修改”,“仅对当前执行的应用程序为真”。

Also from the documentation, 另外从文档中,

A hash function is used to quickly generate a number (hash code) that corresponds to the value of an object. 哈希函数用于快速生成对应于对象的数字(哈希码)。 Hash functions are usually specific to each Type and must use at least one of the instance fields as input. 散列函数通常特定于每个类型,并且必须至少使用一个实例字段作为输入。 [ Emphasis added is mine. [ 重点补充是我的。 ] ]

As for the actual implementation, it clearly states that derived classes can defer to the Object.GetHashCode implementation if and only if that derived class defines value equality to be reference equality and the type is not a value type. 至于实际实现,它明确指出派生类可以推迟到Object.GetHashCode实现, 当且仅当该派生类将值相等定义为引用相等且类型不是值类型时。 In other words, the default implementation of Object.GetHashCode is going to be based on reference equality since there are no real instance fields to use and, therefore, does not guarantee unique return values for different objects. 换句话说,Object.GetHashCode的默认实现将基于引用相等,因为不存在要使用的实例字段,因此不保证不同对象的唯一返回值。 Otherwise, your implementation should be specific to your type and should use at least one of your instance fields. 否则,您的实现应该特定于您的类型,并且应该至少使用一个实例字段。 As an example, the implementation of String.GetHashCode returns identical hash codes for identical string values, so two String objects return the same hash code if they represent the same string value, and uses all the characters in the string to generate that hash value. 例如,String.GetHashCode的实现为相同的字符串值返回相同的哈希码,因此如果两个String对象表示相同的字符串值,则返回相同的哈希码,并使用字符串中的所有字符生成该哈希值。

Rules 1 & 3 aren't really a contradiction. 规则1和3并不是真正的矛盾。

For a reference type the hash code is derived from a reference to the object - change an object's property and the reference is the same. 对于引用类型,哈希代码是从对象的引用派生的 - 更改对象的属性,引用是相同的。

For value types the hash code is derived from the value, change a property of a value type and you get a completely new instance of the value type. 对于值类型,哈希码是从值派生的,更改值类型的属性,并获得值类型的全新实例。

A very good explanation on how to handle GetHashCode (beyond Microsoft rules) is given in Eric Lipperts (co. Designer of C#) Blog with the article " Guidelines and rules for GetHashCode ". 有关如何处理GetHashCode (超出Microsoft规则)的非常好的解释在Eric Lipperts(C#的设计者)博客中提供了文章“ GetHashCode的指南和规则 ”。 It is not good practice to add hyperlinks here (since they can get invalid) but this one is worth it, and provided the information above one will probably still find it in case the hyperlink is lost. 在这里添加超链接是不好的做法(因为它们可能无效),但这个是值得的,如果上面的信息可能仍然会发现它以防超链接丢失。

I can't know for sure how Object.GetHashCode is implemented in real .NET Framework, but in Rotor it uses SyncBlock index for the object as hashcode. 我无法确定Object.GetHashCode是如何在真正的 .NET Framework中实现的,但在Rotor中它将对象的SyncBlock索引用作哈希码。 There are some blog posts about it on the web, however most of them are from 2005. 网上有一些关于它的博客文章,但其中大部分都来自2005年。

By default it does it based on the reference to the object, but that means that it's the exact same object, so both would return the same hash. 默认情况下,它基于对对象的引用来完成它,但这意味着它是完全相同的对象,因此两者都将返回相同的哈希。 But a hash should be based on the value, like in the case of the string class. 但是散列应该基于值,就像字符串类一样。 "a" and "b" would have a different hash, but "a" and "a" would return the same hash. “a”和“b”将具有不同的散列,但“a”和“a”将返回相同的散列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM