简体   繁体   English

带字符串键的GetHashCode()

[英]GetHashCode() with string keys

Hey all, I've been reading up on the best way to implement the GetHashCode() override for objects in .NET, and most answers I run across involve somehow munging numbers together from members that are numeric types to come up with a method. 嘿所有,我一直在阅读实现.NET中对象的GetHashCode()覆盖的最佳方法,并且我遇到的大多数答案涉及以某种方式将来自数字类型的成员的数字组合在一起来提出方法。 Problem is, I have an object that uses an alphanumeric string as its key, and I'm wondering if there's something fundamentally wrong with just using an internal ID for objects with strings as keys, something like the following? 问题是,我有一个使用字母数字字符串作为其键的对象,我想知道是否有一些根本错误的东西只是使用内部ID作为键的字符串的对象,如下所示?


// Override GetHashCode() to return a permanent, unique identifier for
// this object.
static private int m_next_hash_id = 1;
private int m_hash_code = 0;
public override int GetHashCode() {
  if (this.m_hash_code == 0)
    this.m_hash_code = <type>.m_next_hash_id++;
  return this.m_hash_code;
}

Is there a better way to come up with a unique hash code for an object that uses an alphanumeric string as its key? 是否有更好的方法为使用字母数字字符串作为键的对象提供唯一的哈希码? (And no, the numeric parts of the alphanumeric string isn't unique; some of these strings don't actually have numbers in them at all.) Any thoughts would be appreciated! (不,字母数字字符串的数字部分不是唯一的;其中一些字符串实际上根本没有数字。)任何想法都将不胜感激!

You can call GetHashCode() on the non-numeric values that you use in your object. 您可以对对象中使用的非数字值调用GetHashCode()

private string m_foo;
public override int GetHashCode()
{
    return m_foo.GetHashCode();
}

This is not a good pattern for generating hashes for an object. 这不是为对象生成哈希的好模式。

It's important to undunderstand the purpose of GetHashCode() - it's a way to generate a numeric representation of the identifying properties of an object. 重新理解GetHashCode()的目的很重要 - 它是一种生成对象标识属性的数字表示的方法。 Hash codes are used to allow an object to serve as a key in a dictionary and in some cases accelerate comparisons between complex types. 散列码用于允许对象充当字典中的键,并在某些情况下加速复杂类型之间的比较。

If you simply generate a random value and call it a hash code, you have no repeatability. 如果您只是生成一个随机值并将其称为哈希码,则您没有可重复性。 Another instance with the same key fields will have a different hash code, and will violate the behavior expected by classes like HashSet, Dictionary, etc. 具有相同键字段的另一个实例将具有不同的哈希码,并且将违反HashSet,Dictionary等类所期望的行为。

If you already have an identifying string member in you object, just return its hash code. 如果您已在对象中拥有标识字符串成员,则只返回其哈希代码。

The documentation on MSDN for implementers of GetHashCode() is a must read for anyone that plans on overriding that method: 对于计划覆盖该方法的任何人来说,必须阅读有关GetHashCode()实现者的MSDN文档

Notes to Implementers 对实施者的说明

A hash function is used to quickly generate a number (hash code) that corresponds to the value of an object. 哈希函数用于快速生成对应于对象值的数字(哈希码)。 Hash functions are usually specific to each Type and, for uniqueness, must use at least one of the instance fields as input. 散列函数通常特定于每个类型,并且为了唯一性,必须至少使用一个实例字段作为输入。

A hash function must have the following properties: 哈希函数必须具有以下属性:

If two objects compare as equal, the GetHashCode method for each object must return the same value. 如果两个对象比较相等,则每个对象的GetHashCode方法必须返回相同的值。 However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values. 但是,如果两个对象的比较不相等,则两个对象的GetHashCode方法不必返回不同的值。

The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. 只要没有对对象状态的修改来确定对象的Equals方法的返回值,对象的GetHashCode方法必须始终返回相同的哈希代码。 Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again. 请注意,这仅适用于当前应用程序的执行,并且如果再次运行应用程序,则可以返回不同的哈希代码。

For the best performance, a hash function must generate a random distribution for all input. 为获得最佳性能,哈希函数必须为所有输入生成随机分布。

For example, the implementation of the GetHashCode method provided by the String class returns identical hash codes for identical string values. 例如,String类提供的GetHashCode方法的实现为相同的字符串值返回相同的哈希码。 Therefore, two String objects return the same hash code if they represent the same string value. 因此,如果两个String对象表示相同的字符串值,则它们返回相同的哈希码。 Also, the method uses all the characters in the string to generate reasonably randomly distributed output, even when the input is clustered in certain ranges (for example, many users might have strings that contain only the lower 128 ASCII characters, even though a string can contain any of the 65,535 Unicode characters). 此外,该方法使用字符串中的所有字符生成合理随机分布的输出,即使输入聚集在特定范围内(例如,许多用户可能包含仅包含低128个ASCII字符的字符串,即使字符串可以包含任何65,535个Unicode字符)。

Hash codes don't have to be unique. 哈希码不必是唯一的。 Provided your Equals implementation is correct, it's OK to return the same hash code for two instances. 如果你的Equals实现是正确的,那么为两个实例返回相同的哈希码是可以的。 The m_next_hash_id logic is broken, since it allows two objects to have different hash codes even if they compare equals. m_next_hash_id逻辑被破坏,因为它允许两个对象具有不同的哈希码,即使它们比较等于。

MSDN gives a good set of instructions on how to implement Equals and GetHashCode . MSDN提供了一套关于如何实现EqualsGetHashCode的良好指令。 Several of the examples here implement GetHashCode in terms of the hash codes of an object's fields 这里的一些例子根据对象字段的哈希码实现了GetHashCode

Yes, a better way would be to use the hashcode of the string you already have. 是的,更好的方法是使用已有字符串的哈希码。 If the alpha numeric string defines the identity of the object you have, it's hashcode will do quite nicely for the hashcode of your object. 如果字母数字字符串定义了您拥有的对象的标识,那么它的哈希码对于对象的哈希码会很好。

The idea of incrementing a static field and using it as the hashcode, is a bad one. 增加静态字段并将其用作哈希码的想法很糟糕。 The hash code should have an even distribution across the space of possible values. 哈希码应该在可能值的空间内具有均匀分布。 This ensures, amongst other things, that it will perform well when used as the key in a hashtable. 除此之外,这确保了当用作哈希表中的键时它将表现良好。

I believe you generally want GetHashCode() to return something that identifies the object by it's value, rather than it's instance, if I'm understanding the idea here, I think your method would ensure GetHashCode() on two different objects with equivalent values would return different hashes just because they're different instances. 我相信你通常希望GetHashCode()返回一些通过它的值来识别对象的东西,而不是它的实例,如果我在这里理解这个想法,我认为你的方法将确保GetHashCode()在两个具有相同值的不同对象上返回不同的哈希只是因为它们是不同的实例。

GetHashCode() is meant to return a value that lets you compare two objects values, not their references. GetHashCode()旨在返回一个值,使您可以比较两个对象值,而不是它们的引用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM