简体   繁体   中英

Is Object.GetHashCode() unique to a reference or a value?

The MSDN documentation on Object.GetHashCode() describes 3 contradicting rules for how the method should work.

  1. If two objects of the same type represent the same value, the hash function must return the same constant value for either object.
  2. For the best performance, a hash function must generate a random distribution for all input.
  3. The hash function must return exactly the same value regardless of any changes that are made to the object.

Rules 1 & 3 are contradictory to me.

Does Object.GetHashCode() return a unique number based on the value of an object, or the reference to the object. If I override the method I can choose what to use, but I'd like to know what is used internally if anyone knows.

Rules 1 & 3 are contradictory to me.

To a certain extent, they are. The reason is simple: if an object is stored in a hash table and, by changing its value, you change its hash then the hash table has lost the value and you can't find it again by querying the hash table. It is important that while objects are stored in a hash table, they retain their hash value.

To realize this it is often simplest to make hashable objects immutable, thus evading the whole problem. It is however sufficient to make only those fields immutable that determine the hash value.

Consider the following example:

struct Person {
    public readonly string FirstName;
    public readonly string Name;
    public readonly DateTime Birthday;

    public int ShoeSize;
}

People rarely change their birthday and most people never change their name (except when marrying). However, their shoe size may grow arbitrarily, or even shrink. It is therefore reasonable to identify people using their birthday and name but not their shoe size. The hash value should reflect this:

public int GetHashCode() {
    return FirstName.GetHashCode() ^ Name.GetHashCode() ^ Birthday.GetHashCode();
}

Not sure what MSDN documentation you are referring to. Looking at the current documentation on Object.GetHashCode ( http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx ) provides the following "rules":

  • If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

  • The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.

  • For the best performance, a hash function must generate a random distribution for all input.

If you are referring to the second bullet point, the key phrases here are "as long as there is no modification to the object state" and "true only for the current execution of an application".

Also from the documentation,

A hash function is used to quickly generate a number (hash code) that corresponds to the value of an object. Hash functions are usually specific to each Type and must use at least one of the instance fields as input. [ Emphasis added is mine. ]

As for the actual implementation, it clearly states that derived classes can defer to the Object.GetHashCode implementation if and only if that derived class defines value equality to be reference equality and the type is not a value type. In other words, the default implementation of Object.GetHashCode is going to be based on reference equality since there are no real instance fields to use and, therefore, does not guarantee unique return values for different objects. Otherwise, your implementation should be specific to your type and should use at least one of your instance fields. As an example, the implementation of String.GetHashCode returns identical hash codes for identical string values, so two String objects return the same hash code if they represent the same string value, and uses all the characters in the string to generate that hash value.

Rules 1 & 3 aren't really a contradiction.

For a reference type the hash code is derived from a reference to the object - change an object's property and the reference is the same.

For value types the hash code is derived from the value, change a property of a value type and you get a completely new instance of the value type.

A very good explanation on how to handle GetHashCode (beyond Microsoft rules) is given in Eric Lipperts (co. Designer of C#) Blog with the article " Guidelines and rules for GetHashCode ". It is not good practice to add hyperlinks here (since they can get invalid) but this one is worth it, and provided the information above one will probably still find it in case the hyperlink is lost.

I can't know for sure how Object.GetHashCode is implemented in real .NET Framework, but in Rotor it uses SyncBlock index for the object as hashcode. There are some blog posts about it on the web, however most of them are from 2005.

By default it does it based on the reference to the object, but that means that it's the exact same object, so both would return the same hash. But a hash should be based on the value, like in the case of the string class. "a" and "b" would have a different hash, but "a" and "a" would return the same hash.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM