简体   繁体   English

重写GetHashCode并从string属性获取它是否安全?

[英]Is it safe to override GetHashCode and get it from string property?

I have a class: 我有一堂课:

public class Item
{
    public string Name { get; set; }

    public override int GetHashCode()
    {
        return Name.GetHashCode();
    }
}

The purpose of overriding GetHashCode is that I want to have only one occurence of an object with specified name in Dictionary. 重写GetHashCode的目的是,我只希望出现一次字典中具有指定名称的对象。

But is it safe to get hash code from string? 但是从字符串获取哈希码是否安全? In other words, is there any chance that two objects with different values of property Name would return the same hash code? 换句话说,具有属性Name的不同值的两个对象是否有可能返回相同的哈希码?

But is it safe to get hash code from string? 但是从字符串获取哈希码是否安全?

Yes, it is safe. 是的,这很安全。 But , what you're doing isn't. 但是 ,您正在做的不是。 You're using a mutable string field to generate your hash code. 您正在使用可变string字段来生成您的哈希码。 Let's imagine that you inserted an Item as a key for a given value. 假设您插入了一个Item作为给定值的键。 Then, someone changes the Name string to something else. 然后,有人将Name字符串更改为其他Name You now are no longer able to find the same Item inside your Dictionary , HashSet , or whichever structure you use. 现在,您将不再能够在DictionaryHashSet或使用的任何结构内找到相同的Item

More-so, you should be relying on immutable types only. 而且,您应该仅依赖于不可变类型。 I'd also advise you to implement IEquatable<T> as well: 我也建议您也实现IEquatable<T>

public class Item : IEquatable<Item>
{
    public Item(string name)
    {
        Name = name;
    }

    public string Name { get; }

    public bool Equals(Item other)
    {
        if (ReferenceEquals(null, other)) return false;
        if (ReferenceEquals(this, other)) return true;
        return string.Equals(Name, other.Name);
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != this.GetType()) return false;
        return Equals((Item) obj);
    }

    public static bool operator ==(Item left, Item right)
    {
        return Equals(left, right);
    }

    public static bool operator !=(Item left, Item right)
    {
        return !Equals(left, right);
    }

    public override int GetHashCode()
    {
        return (Name != null ? Name.GetHashCode() : 0);
    }
}

is there any chance that two objects with different values of property Name would return the same hash code? 具有属性Name的不同值的两个对象是否有可能返回相同的哈希码?

Yes, there is a statistical chance that such a thing will happen. 是的,统计上有可能发生这种情况。 Hash codes do not guarantee uniqueness. 哈希码不能保证唯一性。 They strive for uni-formal distribution. 他们争取统一发行。 Why? 为什么? because your upper boundary is Int32 , which is 32bits. 因为您的上限是Int32 ,即32位。 Given the Pigenhole Principle , you may happen at end up with two different strings containing the same hash code. 根据Pigenhole原理 ,您可能最终会遇到两个包含相同哈希码的不同字符串。

Your class is buggy, because you have a GetHashCode override, but no Equals override. 您的课程有问题,因为您有GetHashCode覆盖,但没有Equals覆盖。 You also don't consider the case where Name is null. 您也不考虑Name为null的情况。

The rule for GetHashCode is simple: GetHashCode的规则很简单:

If a.Equals(b) then it must be the case that a.GetHashCode() == b.GetHashCode() . 如果a.Equals(b)则必须是a.GetHashCode() == b.GetHashCode()

The more cases where if !a.Equals(b) then a.GetHashCode() != b.GetHashCode() the better, indeed the more cases where !a.Equals(b) then a.GetHashCode() % SomeValue != b.GetHashCode() % SomeValue the better, for any given SomeValue (you can't predict it) so we like to have a good mix of bits in the results. 如果更多!a.Equals(b)然后a.GetHashCode() != b.GetHashCode()越好,实际上!a.Equals(b)然后a.GetHashCode() % SomeValue != b.GetHashCode() % SomeValue更好,对于任何给定的SomeValue (您无法预测),因此我们希望在结果中很好地混合各个位。 But the vital thing is that two objects considered equal must have equal GetHashCode() results. 但是至关重要的是,被认为相等的两个对象必须具有相等的GetHashCode()结果。

Right now this isn't the case, because you've only overridden one of these. 目前情况并非如此,因为您仅覆盖了其中之一。 However the following is sensible: 但是,以下几点是明智的:

public class Item
{
  public string Name { get; set; }

  public override int GetHashCode()
  {
      return Name == null ? 0 : Name.GetHashCode();
  }
  public override bool Equals(object obj)
  {
    var asItem = obj as Item;
    return asItem != null && Name == obj.Name;
  }
}

The following is even better, because it allows for faster strongly-typed equality comparisons: 以下内容甚至更好,因为它允许更快地进行强类型相等比较:

public class Item : IEquatable<Item>
{
  public string Name { get; set; }

  public override int GetHashCode()
  {
      return Name == null ? 0 : Name.GetHashCode();
  }
  public bool Equals(Item other)
  {
    return other != null && Name == other.Name;
  }
  public override bool Equals(object obj)
  {
    return Equals(obj as Item);
  }
}

In other words, is there any chance that two objects with different values of property Name would return the same hash code? 换句话说,具有属性Name的不同值的两个对象是否有可能返回相同的哈希码?

Yes, this can happen, but it won't happen often, so that's fine. 是的,这可以发生,但是不会经常发生,所以很好。 The hash-based collections like Dictionary and HashSet can handle a few collisions; DictionaryHashSet这样的基于哈希的集合可以处理一些冲突。 indeed there'll be collisions even if the hash codes are all different because they're modulo'd down to a smaller index. 即使哈希码各不相同,也确实会发生冲突,因为哈希码被模化为较小的索引。 It's only if this happens a lot that it impacts performance. 只有这种情况经常发生,才会影响性能。

Another danger is that you'll be using a mutable value as a key. 另一个危险是您将使用可变值作为键。 There's a myth that you shouldn't use mutable values for hash-codes, which isn't true; 有一个神话,您不应该对哈希码使用可变值,这是不正确的; if a mutable object has a mutable property that affects what it is considered equal with then it must result in a change to the hash-code. 如果可变对象的可变属性会影响认为与之相等的对象,那么它必须导致哈希码发生变化。

The real danger is mutating an object that is a key to a hash collection at all. 真正的危险是要突变一个对象,而该对象是哈希集合的关键。 If you are defining equality based on Name and you have such an object as the key to a dictionary then you must not change Name while it is used as such a key. 如果要基于Name定义相等性,并且具有这样的对象作为字典的键,则在将Name用作此类键时, 一定不能更改Name The easiest way to ensure that is to have Name be immutable, so that is definitely a good idea if possible. 确保Name不变的最简单方法是,如果可能的话,这绝对是个好主意。 If it is not possible though, you need to be careful just when you allow Name to be changed. 但是,如果不可能,则仅在允许更改Name时需要小心。

From a comment: 来自评论:

So, even if there is a collision in hash codes, when Equals will return false (because the names are different), the Dictionary will handle propertly? 因此,即使哈希码发生冲突,当Equals返回false(因为名称不同)时,Dictionary是否会正确处理?

Yes, it will handle it, though it's not ideal. 是的,虽然不是很理想,但可以处理。 We can test this with a class like this: 我们可以用这样的类进行测试:

public class SuckyHashCode : IEquatable<SuckyHashCode>
{
  public int Value { get; set; }
  public bool Equals(SuckyHashCode other)
  {
    return other != null && other.Value == Value;
  }
  public override bool Equals(object obj)
  {
    return Equals(obj as SuckyHashCode);
  }
  public override int GetHashCode()
  {
    return 0;
  }
}

Now if we use this, it works: 现在,如果我们使用它,它将起作用:

var dict = Enumerable.Range(0, 1000).Select(i => new SuckyHashCode{Value = i}).ToDictionary(shc => shc);
Console.WriteLine(dict.ContainsKey(new SuckyHashCode{Value = 3})); // True
Console.WriteLine(dict.ContainsKey(new SuckyHashCode{Value = -1})); // False

However, as the name suggests, it isn't ideal. 但是,顾名思义,它并不理想。 Dictionaries and other hash-based collections all have means to deal with collisions, but those means mean that we no longer have the great nearly O(1) look-up, but rather as the percentage of collisions gets greater the look-up approaches O(n). 字典和其他基于散列的集合都具有处理冲突的手段,但这些手段意味着我们不再具有出色的近O(1)查找,而是随着冲突百分比的增加,查找方法变得更加复杂(N)。 In the case above where the GetHashCode is as bad as it could be without actually throwing an exception, the look-up would be O(n) which is the same as just putting all the items into an unordered collection and then finding them by looking at every one to see if it matches (indeed, due to differences in overheads, it's actually worse than that). 在上面的情况中, GetHashCode在没有实际引发异常的情况下尽可能糟糕,其查找将是O(n),与将所有项目放入无序集合然后通过查找找到它们相同每一个都看是否匹配(实际上,由于开销的不同,实际上比这差)。

So for this reason we always want to avoid collisions as much as possible. 因此,出于这个原因,我们始终希望尽可能避免冲突。 Indeed, to not just avoid collisions, but to avoid collisions after the result has been modulo'd down to make a smaller hash code (because that's what happens internally to the dictionary). 实际上,不仅要避免冲突,还要避免在对结果进行模降低以生成较小的哈希码之后进行冲突(因为这是字典内部发生的事情)。

In your case though because string.GetHashCode() is reasonably good at avoiding collisions, and because that one string is the only thing that equality is defined by, your code would in turn be reasonably good at avoiding collisions. 在您的情况下,尽管string.GetHashCode()在避免冲突方面相当擅长,并且因为唯一定义相等性的是一个字符串,所以您的代码在避免冲突方面也相当不错。 More collision-resistant code is certainly possible, but comes at a cost to performance in the the code itself* and/or is more work than can be justified. 当然,可以使用更多的抗冲突代码,但是这会损害代码本身的性能*,并且/或者工作量超出合理范围。

*(Though see https://www.nuget.org/packages/SpookilySharp/ for code of mine that is faster than string.GetHashCode() on large strings on 64-bit .NET and more collision-resistant, though it is slower to produce those hash codes on 32-bit .NET or when the string is short). *(尽管我的代码比https://www.nuget.org/packages/SpookilySharp/更快,但在64位.NET上的大字符串上比string.GetHashCode()更快,并且更耐碰撞,尽管速度较慢以在32位.NET或字符串短时生成那些哈希码)。

Instead of using GetHashCode to prevent duplicates to be added to a dictionary, which is risky in your case as explained already, I would recommend to use a (custom) equality comparer for your dictionary. 我建议不要使用GetHashCode来防止将重复项添加到字典中(这对您来说是危险的,正如已经说明的那样),我建议为您的字典使用(自定义) 相等比较器

If the key is an object, you should create an own equality comparer that compares the string Name value. 如果键是对象,则应创建一个自己的相等比较器,以比较string Name值。 If the key is the string itself, you can use StringComparer.CurrentCulture for example. 如果键是string本身,则可以使用StringComparer.CurrentCulture例如。

Also in this case it is key to make the string immutable, since else you might invalidate your dictionary by changing the Name . 同样在这种情况下,使string不可变也是关键,因为否则您可能会通过更改Name来使字典无效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM