简体   繁体   English

在C#中实现GetHashCode。空值处理

[英]Implementing GetHashCode in C#. Null-value handling

Before I begin, all code samples here I tested on Mono environment and there is one noticeable difference in the GetHashCode implementations: 在开始之前,我在Mono环境中测试了所有代码示例,并且GetHashCode实现中有一个明显的区别:

string.Empty.GetHashCode(); // returns 0 in Mono 3.10
string.Empty.GetHashCode(); // returns 757602046 in .NET 4.5.1

I made my implementation based on this SO Answer by @JonSkeet and in the comments he also suggests to use 0 hash code for NULL values (wasn't sure how should I hash them). 我根据@JonSkeet的SO回答做了我的实现,并且在评论中他还建议使用0哈希值来表示NULL值(不知道我应该如何哈希它们)。

I usually use 0 as the effective hash code for null - which isn't the same as ignoring the field. 我通常使用0作为null的有效哈希码 - 这与忽略该字段不同。

So having following implementation (Mono 3.10): 所以有以下实现 (Mono 3.10):

public class Entity {
    public int EntityID { get; set; }
    public string EntityName { get; set; }

    public override int GetHashCode() {
        unchecked {
            int hash = 15485863;       // prime number
            int multiplier = 1299709;  // another prime number

            hash = hash * multiplier + EntityID.GetHashCode();
            hash = hash * multiplier + (EntityName != null ? EntityName.GetHashCode() : 0);

            return hash;
        }
    }
}

It is quite easy to find collision eg 很容易发现碰撞,例如

var hash1 = new Entity { EntityID = 1337, EntityName = "" }.GetHashCode();
var hash2 = new Entity { EntityID = 1337, EntityName = null }.GetHashCode();

bool equals = hash1 == hash2; // true

I could replace null-value 0 with some other number, however it won't fix the problem as there still is a chance that some hash(string) output will generate such number and I'll get another collision. 我可以用其他一些数字替换null-value 0,但是它不会解决问题,因为仍然有一些哈希(字符串)输出会生成这样的数字,我会得到另一个碰撞。

My question: How should I handle null values while using algorithm from example above? 我的问题:在使用上面的示例算法时,我应该如何处理空值?

My question: How should I handle null values while using algorithm from example above? 我的问题:在使用上面的示例算法时,我应该如何处理空值?

I don't think the problem is with null per-se. 我不认为这个问题是与null每本身。 The problem lays in the fact you're using GetHashCode for equality, which it isn't meant for. 问题在于你正在使用GetHashCode来实现相等,这并不意味着它。 GetHashCode should provide such hashes that aspire to normal distribution. GetHashCode应该提供渴望正常分发的哈希。

The docs say : 文档说

Two objects that are equal return hash codes that are equal. 两个相等的对象返回相等的哈希码。 However, the reverse is not true: equal hash codes do not imply object equality, because different (unequal) objects can have identical hash codes. 但是, 相反的情况并非如此:相等的哈希码并不意味着对象相等,因为不同(不相等)的对象可以具有相同的哈希码。

And then goes on to specify the purpose of GetHashCode : 然后继续指定GetHashCode的目的:

A hash code is intended for efficient insertion and lookup in collections that are based on a hash table. 哈希码旨在用于在基于哈希表的集合中进行有效插入和查找。

You should be implementing IEquatable<Entity> , where you actually define the equivalence relation of two entities. 您应该实现IEquatable<Entity> ,您实际上定义了两个实体的等价关系。 And override != and == while you're at it. 当你在它时,覆盖!===

An approximation: 近似值:

public class Entity : IEquatable<Entity>
{
    public int EntityId { get; set; }
    public string EntityName { get; set; }

    public bool Equals(Entity other)
    {
        if (ReferenceEquals(null, other)) return false;
        if (ReferenceEquals(this, other)) return true;
        return EntityId == other.EntityId && 
               string.Equals(EntityName, other.EntityName);
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != this.GetType()) return false;
        return Equals((Entity) obj);
    }

    public static bool operator ==(Entity left, Entity right)
    {
        return Equals(left, right);
    }

    public static bool operator !=(Entity left, Entity right)
    {
        return !Equals(left, right);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            return (EntityId*397) ^ (EntityName != null ? EntityName.GetHashCode() : 0);
        }
    }
}

Your "problem" here is that you are trying the get collision free hash codes. 这里的“问题”是你正在尝试获取无冲突的哈希码。 While this is perfect for the lookup performance of collection implementations that use the hash code for lookup (eg HashSet and Dictionary ) in the most cases this will not work. 虽然这对于在大多数情况下使用哈希代码进行查找(例如HashSetDictionary )的集合实现的查找性能来说是完美的,但这不起作用。

The reason for that is that the hash code is just a 32-bit integer value and it represents data that is usually a lot bigger (multiple integer values, strings, etc.). 原因是哈希码只是一个32位整数值,它表示通常更大的数据(多个整数值,字符串等)。

So the hash code is only there to define that two objects could be equal. 所以哈希码只是定义两个对象可以相等。 The collection classes use the hash code to refine the area where the object is stored and use the equals function to find if two objects are really the same. 集合类使用哈希代码来优化存储对象的区域,并使用equals函数来查找两个对象是否真的相同。 For that reason you should always implement the Equals function for classes you implemented the hash code for. 因此,您应该始终为实现哈希代码的类实现Equals函数。 While those classes will fall back to the equals function of object, is it also a good idea to implement the IEquatable<T> interface to avoid typing problems of any kind (still overwrite the default equals method of Object!) 虽然这些类将回退到对象的equals函数,但是实现IEquatable<T>接口以避免输入任何类型的问题(仍然覆盖Object的默认equals方法)也是一个好主意。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM