简体   繁体   中英

GetHashCode() with string keys

Hey all, I've been reading up on the best way to implement the GetHashCode() override for objects in .NET, and most answers I run across involve somehow munging numbers together from members that are numeric types to come up with a method. Problem is, I have an object that uses an alphanumeric string as its key, and I'm wondering if there's something fundamentally wrong with just using an internal ID for objects with strings as keys, something like the following?


// Override GetHashCode() to return a permanent, unique identifier for
// this object.
static private int m_next_hash_id = 1;
private int m_hash_code = 0;
public override int GetHashCode() {
  if (this.m_hash_code == 0)
    this.m_hash_code = <type>.m_next_hash_id++;
  return this.m_hash_code;
}

Is there a better way to come up with a unique hash code for an object that uses an alphanumeric string as its key? (And no, the numeric parts of the alphanumeric string isn't unique; some of these strings don't actually have numbers in them at all.) Any thoughts would be appreciated!

You can call GetHashCode() on the non-numeric values that you use in your object.

private string m_foo;
public override int GetHashCode()
{
    return m_foo.GetHashCode();
}

This is not a good pattern for generating hashes for an object.

It's important to undunderstand the purpose of GetHashCode() - it's a way to generate a numeric representation of the identifying properties of an object. Hash codes are used to allow an object to serve as a key in a dictionary and in some cases accelerate comparisons between complex types.

If you simply generate a random value and call it a hash code, you have no repeatability. Another instance with the same key fields will have a different hash code, and will violate the behavior expected by classes like HashSet, Dictionary, etc.

If you already have an identifying string member in you object, just return its hash code.

The documentation on MSDN for implementers of GetHashCode() is a must read for anyone that plans on overriding that method:

Notes to Implementers

A hash function is used to quickly generate a number (hash code) that corresponds to the value of an object. Hash functions are usually specific to each Type and, for uniqueness, must use at least one of the instance fields as input.

A hash function must have the following properties:

If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.

For the best performance, a hash function must generate a random distribution for all input.

For example, the implementation of the GetHashCode method provided by the String class returns identical hash codes for identical string values. Therefore, two String objects return the same hash code if they represent the same string value. Also, the method uses all the characters in the string to generate reasonably randomly distributed output, even when the input is clustered in certain ranges (for example, many users might have strings that contain only the lower 128 ASCII characters, even though a string can contain any of the 65,535 Unicode characters).

Hash codes don't have to be unique. Provided your Equals implementation is correct, it's OK to return the same hash code for two instances. The m_next_hash_id logic is broken, since it allows two objects to have different hash codes even if they compare equals.

MSDN gives a good set of instructions on how to implement Equals and GetHashCode . Several of the examples here implement GetHashCode in terms of the hash codes of an object's fields

Yes, a better way would be to use the hashcode of the string you already have. If the alpha numeric string defines the identity of the object you have, it's hashcode will do quite nicely for the hashcode of your object.

The idea of incrementing a static field and using it as the hashcode, is a bad one. The hash code should have an even distribution across the space of possible values. This ensures, amongst other things, that it will perform well when used as the key in a hashtable.

I believe you generally want GetHashCode() to return something that identifies the object by it's value, rather than it's instance, if I'm understanding the idea here, I think your method would ensure GetHashCode() on two different objects with equivalent values would return different hashes just because they're different instances.

GetHashCode() is meant to return a value that lets you compare two objects values, not their references.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM