简体   繁体   English

在没有不可变字段的类中重写Object.GetHashCode()时要返回什么?

[英]What to return when overriding Object.GetHashCode() in classes with no immutable fields?

Ok, before you get all mad because there are hundreds of similar sounding questions posted on the internet, I can assure you that I have just spent the last few hours reading all of them and have not found the answer to my question. 好吧,在你因为互联网上发布了数百个类似的声音问题而疯狂之前,我可以向你保证,我刚刚花了几个小时阅读所有这些问题并且没有找到我的问题的答案。

Background: 背景:

Basically, one of my large scale applications had been suffering from a situation where some Binding s on the ListBox.SelectedItem property would stop working or the program would crash after an edit had been made to the currently selected item. 基本上,我的一个大型应用程序遇到了ListBox.SelectedItem属性上的某些Binding将停止工作或者在对当前所选项目进行编辑后程序崩溃的情况。 I initially asked the 'An item with the same key has already been added' Exception on selecting a ListBoxItem from code question here, but got no answers. 我最初问过'已经添加了相同密钥的项目'从代码问题中选择ListBoxItem的例外情况 ,但没有得到答案。

I hadn't had time to address that problem until this week, when I was given a number of days to sort it out. 直到本周,我才有时间解决这个问题。 Now to cut a long story short, I found out the reason for the problem. 现在简而言之,我找出了问题的原因。 It was because my data type classes had overridden the Equals method and therefore the GetHashCode method as well. 这是因为我的数据类型类已经覆盖了Equals方法,因此也覆盖了GetHashCode方法。

Now for those of you that are unaware of this issue, I discovered that you can only implement the GetHashCode method using immutable fields/properties. 现在对于那些不知道这个问题的人,我发现你只能使用不可变字段/属性来实现GetHashCode方法。 Using a excerpt from Harvey Kwok's answer to the Overriding GetHashCode() post to explain this: 使用Harvey Kwok对Overriding GetHashCode()帖子的回答摘录来解释这个:

The problem is that GetHashCode is being used by Dictionary and HashSet collections to place each item in a bucket. 问题是Dictionary和HashSet集合正在使用GetHashCode将每个项目放在存储桶中。 If hashcode is calculated based on some mutable fields and the fields are really changed after the object is placed into the HashSet or Dictionary, the object can no longer be found from the HashSet or Dictionary. 如果基于某些可变字段计算哈希码,并且在将对象放入HashSet或Dictionary后实际更改了字段,则无法再从HashSet或Dictionary中找到该对象。

So the actual problem was caused because I had used mutable properties in the GetHashCode methods. 所以实际问题是因为我在GetHashCode方法中使用了可变属性。 When users changed these property values in the UI, the associated hash code values of the objects changed and then items could no longer be found in their collections. 当用户在UI中更改这些属性值时,对象的关联哈希码值会发生更改,然后在其集合中无法再找到项目。

Question: 题:

So, my question is what is the best way of handling the situation where I need to implement the GetHashCode method in classes with no immutable fields? 所以,我的问题是什么是处理我需要在没有不可变字段的类中实现GetHashCode方法的情况的最佳方法? Sorry, let me be more specific, as that question has been asked before. 对不起,让我更加具体,因为这个问题已经被问过。

The answers in the Overriding GetHashCode() post suggest that in these situations, it is better to simply return a constant value... some suggest to return the value 1 , while other suggest returning a prime number. Overriding GetHashCode()帖子中的答案表明,在这些情况下,最好只返回一个常量值...一些建议返回值1 ,而其他建议返回一个素数。 Personally, I can't see any difference between these suggestions because I would have thought that there would only be one bucket used for either of them. 就个人而言,我看不出这些建议之间有任何区别,因为我认为只有一个桶用于其中任何一个。

Furthermore, the Guidelines and rules for GetHashCode article in Eric Lippert's Blog has a section titled Guideline: the distribution of hash codes must be "random" which highlights the pitfalls of using an algorithm that results in not enough buckets being used. 此外,Eric Lippert博客中关于GetHashCode指南和规则有一个标题为指南的部分:哈希码的分布必须是“随机的” ,这突出了使用导致使用不足的桶的算法的缺陷。 He warns of algorithms that decrease the number of buckets used and cause a performance problem when the bucket gets really big . 他警告说,算法会减少使用的桶数,并在桶变得非常大时导致性能问题 Surely, returning a constant falls into this category. 当然,返回常数属于这一类。

I had an idea of adding an extra Guid field to all of my data type classes (just in C#, not the database) specifically to be used in and only in the GetHashCode method. 我想到了为我的所有数据类型类(仅在C#中,而不是数据库中)添加一个额外的Guid字段,专门用于GetHashCode方法。 So I suppose at the end of this long intro, my actual question is which implementation is better? 所以我想在这个长篇介绍的最​​后,我的实际问题是哪个实现更好? To summarise: 总结一下:

Summary: 摘要:

When overriding Object.GetHashCode() in classes with no immutable fields, is it better to return a constant from the GetHashCode method, or to create an additional readonly field for each class, solely to be used in the GetHashCode method? 在没有不可变字段的类中重写Object.GetHashCode()时,最好从GetHashCode方法返回一个常量,还是为每个类创建一个额外的readonly字段,仅用于GetHashCode方法? If I should add a new field, what type should it be and shouldn't I then include it in the Equals method? 如果我应该添加一个新字段,它应该是什么类型,我不应该将它包含在Equals方法中?

While I am happy to receive answers from anyone, I am really hoping to receive answers from advanced developers with a sound knowledge on this subject. 虽然我很高兴收到任何人的答案,但我真的希望得到高级开发人员的答案,他们对这个主题有充分的了解。

Go back to basics. 回到基础。 You read my article; 你看了我的文章; read it again. 再读一遍。 The two ironclad rules that are relevant to your situation are: 与您的情况相关的两个铁定规则是:

  • if x equals y then the hash code of x must equal the hash code of y. 如果x等于y,那么x的哈希码必须等于y的哈希码。 Equivalently: if the hash code of x does not equal the hash code of y then x and y must be unequal. 等价地:如果x的哈希码不等于y的哈希码,那么x和y必须是不相等的。
  • the hash code of x must remain stable while x is in a hash table. 当x在哈希表中时,x的哈希码必须保持稳定。

Those are requirements for correctness . 这些是正确性的要求。 If you can't guarantee those two simple things then your program will not be correct. 如果你不能保证这两件简单的事情,那么你的程序将是不正确的。

You propose two solutions. 你提出两个解决方案。

Your first solution is that you always return a constant. 您的第一个解决方案是始终返回常量。 That meets the requirement of both rules, but you are then reduced to linear searches in your hash table. 这符合两个规则的要求,但您将在哈希表中简化为线性搜索。 You might as well use a list. 你也可以使用一个列表。

The other solution you propose is to somehow produce a hash code for each object and store it in the object. 您建议的另一个解决方案是以某种方式为每个对象生成哈希码并将其存储在对象中。 That is perfectly legal provided that equal items have equal hash codes . 如果相等的项具有相同的哈希码,则这是完全合法的。 If you do that then you are restricted such that x equals y must be false if the hash codes differ. 如果您这样做,那么您受到限制,如果哈希码不同,则x等于y 必须为false。 This seems to make value equality basically impossible. 这似乎使价值平等基本上不可能。 Since you wouldn't be overriding Equals in the first place if you wanted reference equality, this seems like a really bad idea, but it is legal provided that equals is consistent. 因为如果你想要引用相等性,你不会首先重写Equals,这似乎是一个非常糟糕的主意,但是如果equals是一致的,它是合法的

I propose a third solution, which is: never put your object in a hash table, because a hash table is the wrong data structure in the first place. 我提出了第三种解决方案,即:永远不要将对象放在哈希表中,因为哈希表首先是错误的数据结构。 The point of a hash table is to quickly answer the question "is this given value in this set of immutable values?" 哈希表的要点是快速回答问题“这组不可变值中的给定值是什么?” and you don't have a set of immutable values , so don't use a hash table. 并且您没有一组不可变值 ,因此请勿使用哈希表。 Use the right tool for the job. 使用正确的工具完成工作。 Use a list, and live with the pain of doing linear searches. 使用列表,并忍受线性搜索的痛苦。

A fourth solution is: hash on the mutable fields used for equality, remove the object from all hash tables it is in just before every time you mutate it, and put it back in afterwards. 第四个解决方案是:对用于相等的可变字段进行哈希处理,在每次变异之前从所有哈希表中删除该对象,然后将其放回原处。 This meets both requirements: the hash code agrees with equality, and hashes of objects in hash tables are stable, and you still get fast lookups. 这符合两个要求:哈希代码同意相等,哈希表中的对象哈希是稳定的,并且您仍然可以快速查找。

I would either create an additional readonly field or else throw NotSupportedException . 我要么创建一个额外的readonly字段,要么抛出NotSupportedException In my view the other option is meaningless. 在我看来,另一种选择毫无意义。 Let's see why. 让我们看看为什么。

Distinct (fixed) hash codes 不同(固定)哈希码

Providing distinct hash codes is easy, eg: 提供不同的哈希码很容易,例如:

class Sample
{
    private static int counter;
    private readonly int hashCode;

    public Sample() { this.hashCode = counter++; }

    public override int GetHashCode()
    {
        return this.hashCode;
    }

    public override bool Equals(object other)
    {
        return object.ReferenceEquals(this, other);
    }
}

Technically you have to look out for creating too many objects and overflowing the counter here, but in practice I think that's not going to be an issue for anyone. 从技术上讲,你必须注意创造太多物体并在这里溢出counter ,但实际上我认为这对任何人来说都不会成为问题。

The problem with this approach is that instances will never compare equal. 这种方法的问题是实例永远不会比较平等。 However, that's perfectly fine if you only want to use instances of Sample as indexes into a collection of some other type. 但是,如果您只想将Sample实例用作其他类型的集合的索引,那就完全没问题了。

Constant hash codes 常量哈希码

If there is any scenario in which distinct instances should compare equal then at first glance you have no other choice than returning a constant. 如果存在不同实例应该比较的任何情况,那么乍看之下除了返回常量之外别无选择。 But where does that leave you? 但那会让你离开?

Locating an instance inside a container will always degenerate to the equivalent of a linear search. 在容器内定位实例将始终退化为等效的线性搜索。 So in effect by returning a constant you allow the user to make a keyed container for your class, but that container will exhibit the performance characteristics of a LinkedList<T> . 因此,实际上通过返回常量,您允许用户为您的类创建一个键控容器,但该容器将展示LinkedList<T>的性能特征。 This might be obvious to someone familiar with your class, but personally I see it as letting people shoot themselves in the foot. 对于熟悉你班级的人来说,这可能是显而易见的,但我个人认为这是让人们在脚下射击。 If you know from beforehand that a Dictionary won't behave as one might expect, then why let the user create one? 如果您事先知道Dictionary不会像预期的那样表现,那么为什么让用户创建一个呢? In my view, better to throw NotSupportedException . 在我看来,最好抛出NotSupportedException

But throwing is what you must not do! 但扔是你不能做的!

Some people will disagree with the above, and when those people are smarter than oneself then one should pay attention. 有些人会不同意上述情况,当这些人比自己聪明时,人们应该注意。 First of all, this code analysis warning states that GetHashCode should not throw. 首先, 此代码分析警告表明GetHashCode不应该抛出。 That's something to think about, but let's not be dogmatic. 这是值得考虑的事情,但我们不要教条。 Sometimes you have to break the rules for a reason. 有时你必须打破规则是有原因的。

However, that is not all. 然而,这还不是全部。 In his blog post on the subject , Eric Lippert says that if you throw from inside GetHashCode then 在他关于这个主题的博客文章中 ,Eric Lippert说如果你从GetHashCode内部抛出那么

your object cannot be a result in many LINQ-to-objects queries that use hash tables internally for performance reasons. 由于性能原因,您的对象不能成为许多内部使用哈希表的LINQ到对象查询的结果。

Losing LINQ is certainly a bummer, but fortunately the road does not end here. 失去LINQ当然是一个无赖,但幸运的是,这条路并没有结束。 Many (all?) LINQ methods that use hash tables have overloads that accept an IEqualityComparer<T> to be used when hashing. 使用散列表的许多(所有?)LINQ方法都有重载,它们接受在散列时使用的IEqualityComparer<T> So you can in fact use LINQ, but it's going to be less convenient. 所以你实际上可以使用LINQ,但它会不那么方便。

In the end you will have to weigh the options yourself. 最后,您必须自己权衡选项。 My opinion is that it's better to operate with a whitelist strategy (provide an IEqualityComparer<T> whenever needed) as long as it is technically feasible because that makes the code explicit: if someone tries to use the class naively they get an exception that helpfully tells them what's going on and the equality comparer is visible in the code wherever it is used, making the extraordinary behavior of the class immediately clear. 我的观点是,最好使用白名单策略(在需要时提供IEqualityComparer<T> ),只要它在技术上可行,因为这会使代码变得明确:如果有人试图天真地使用该类,他们会得到一个有用的异常告诉他们发生了什么,并且在使用它的任何地方都可以看到相等比较器,使得类的非凡行为立即变得清晰。

Where I want to override Equals , but there is no sensible immutable "key" for an object (and for whatever reason it doesn't make sense to make the whole object immutable), in my opinion there is only one "correct" choice: 我想要覆盖Equals ,但是对象没有合理的不可变“关键字”(无论出于何种原因,使整个对象不可变是没有意义的),在我看来,只有一个“正确”的选择:

  • Implement GetHashCode to hash the same fields as Equals uses. 实现GetHashCode以散列与Equals使用的相同字段。 (This might be all the fields.) (这可能是所有领域。)
  • Document that these fields must not be altered while in a dictionary. 记录在字典中不得更改这些字段的文档。
  • Trust that users either don't put these objects in dictionaries, or obey the second rule. 相信用户要么不将这些对象放在词典中,要么遵守第二条规则。

(Returning a constant value compromises dictionary performance. Throwing an exception disallows too many useful cases where objects are cached but not modified. Any other implementation for GetHashCode would be wrong.) (返回一个常量值会影响字典性能。抛出异常会导致太多有用的情况,其中对象被缓存但未被修改GetHashCode任何其他实现都是错误的。)

Where this runs the user into trouble anyway, it's probably their fault. 无论如何,这会让用户陷入麻烦,这可能是他们的错。 (Specifically: using a dictionary where they shouldn't, or using a model type in a context where they should be using a view-model type that uses reference equality instead.) (具体来说:使用不应该使用的字典,或者在上下文中使用模型类型,它们应该使用使用引用相等性的视图模型类型。)

Or perhaps I shouldn't be overriding Equals in the first place. 或许我不应该首先压倒Equals

If the classes truly contain nothing constant on which a hash value can be calculated then I would use something simpler than a GUID. 如果类真的不包含可以计算哈希值的常量,那么我会使用比GUID更简单的东西。 Just use a random number persisted in the class (or in a wrapper class). 只需使用在类中(或在包装类中)保留的随机数。

A simple approach is to store the hashCode in a private member and generate it on the first use. 一种简单的方法是将hashCode存储在私有成员中,并在第一次使用时生成它。 If your entity doesn't change often, and you're not going to be using two different objects that are Equal (where your Equals method returns true) as keys in your dictionary, then this should be fine: 如果您的实体不经常更改,并且您不会使用两个不同的Equal(您的Equals方法返回true)的对象作为字典中的键,那么这应该没问题:

private int? _hashCode;

public override int GetHashCode() {
   if (!_hashCode.HasValue)
      _hashCode = Property1.GetHashCode() ^ Property2.GetHashCode() etc... based on whatever you use in your equals method
   return _hashCode.Value;
}

However, if you have, say, object a and object b, where a.Equals(b) == true, and you store an entry in your dictionary using a as the key (dictionary[a] = value). 但是,如果您有对象a和对象b,其中a.Equals(b)== true,并且您使用a作为键(词典[a] = value)在词典中存储条目。
If a does not change, then dictionary[b] will return value, however, if you change a after storing the entry in the dictionary, then dictionary[b] will most likely fail. 如果a没有改变,那么dictionary [b]将返回值,但是,如果在将条目存储在字典中之后更改a,则字典[b]很可能会失败。 The only workaround to this is to rehash the dictionary when any of the keys change. 唯一的解决方法是在任何键更改时重新发送字典。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM