简体   繁体   English

GetHashCode扩展方法

[英]GetHashCode Extension Method

After reading all the questions and answers on StackOverflow concerning overriding GetHashCode() I wrote the following extension method for easy and convenient overriding of GetHashCode() : 在阅读有关覆盖GetHashCode() StackOverflow上的所有问题和答案之后,我编写了以下扩展方法,以便轻松方便地覆盖GetHashCode()

public static class ObjectExtensions
{
    private const int _seedPrimeNumber = 691;
    private const int _fieldPrimeNumber = 397;
    public static int GetHashCodeFromFields(this object obj, params object[] fields) {
        unchecked { //unchecked to prevent throwing overflow exception
            int hashCode = _seedPrimeNumber;
            for (int i = 0; i < fields.Length; i++)
                if (fields[i] != null)
                    hashCode *= _fieldPrimeNumber + fields[i].GetHashCode();
            return hashCode;
        }
    }
}

(I basically only refactored the code that someone posted there, because I really like that it can be used generally) (我基本上只重构了有人在那里发布的代码,因为我真的很喜欢它可以一般使用)

which I use like this: 我用的是这样的:

    public override int GetHashCode() {
        return this.GetHashCodeFromFields(field1, field2, field3);
    }

Do you see any problems with this code? 你看到这段代码有什么问题吗?

I wrote some stuff a little while back that you might solve your problem... (And actually, it could probably be improved to include the seed that you have...) 我写了一些东西,你可以解决你的问题...(实际上,它可能会改进,包括你的种子...)

Anyway, the project is called Essence ( http://essence.codeplex.com/ ), and it uses the System.Linq.Expression libraries to generate (based on attributes) standard representations of Equals/GetHashCode/CompareTo/ToString, as well as being able to create IEqualityComparer and IComparer classes based on an argument list. 无论如何,该项目名为Essence( http://essence.codeplex.com/ ),它使用System.Linq.Expression库生成(基于属性)Equals / GetHashCode / CompareTo / ToString的标准表示,以及因为能够基于参数列表创建IEqualityComparer和IComparer类。 (I also have some further ideas, but would like to get some community feedback before continuing too much further.) (我还有一些进一步的想法,但希望得到一些社区反馈,然后再继续下去。)

(What this means is that it's almost as fast as being handwritten - the main one where it isn't is the CompareTo(); cause the Linq.Expressions doesn't have the concept of a variable in the 3.5 release - so you have to call CompareTo() on the underlying object twice when you don't get a match. Using the DLR extensions to Linq.Expressions solves this. I suppose I could have used the emit il, but I wasn't that inspired at the time.) (这意味着它几乎和手写一样快 - 主要的不是CompareTo();因为Linq.Expressions在3.5版本中没有变量的概念 - 所以你有当你没有得到匹配时,在底层对象上调用CompareTo()两次。使用Linq.Expressions的DLR扩展来解决这个问题。我想我可以使用emit il,但当时我没有受到启发。)

It's quite a simple idea, but I haven't seen it done before. 这是一个非常简单的想法,但我之前没有看到它。

Now the thing is, I kind of lost interest in polishing it (which would have included writing an article for codeproject, documenting some of the code, or the like), but I might be persuaded to do so if you feel it would be something of interest. 现在的问题是,我对抛光它感兴趣(这可能包括为codeproject写一篇文章,记录一些代码等),但如果你觉得它会是某种东西,我可能会被说服这样做出于兴趣。

(The codeplex site doesn't have a downloadable package; just go to the source and grab that - oh, it's written in f# (although all the test code is in c#) as that was the thing I was interested in learning.) (codeplex网站没有可下载的软件包;只需转到源代码并抓住它 - 哦,它是用f#编写的(虽然所有测试代码都在c#中),因为那是我有兴趣学习的东西。)

Anyway, here is are c# example from the test in the project: 无论如何,这是项目中测试的c#示例:

    // --------------------------------------------------------------------
    // USING THE ESSENCE LIBRARY:
    // --------------------------------------------------------------------
    [EssenceClass(UseIn = EssenceFunctions.All)]
    public class TestEssence : IEquatable<TestEssence>, IComparable<TestEssence>
    {
        [Essence(Order=0] public int MyInt           { get; set; }
        [Essence(Order=1] public string MyString     { get; set; }
        [Essence(Order=2] public DateTime MyDateTime { get; set; }

        public override int GetHashCode()                                { return Essence<TestEssence>.GetHashCodeStatic(this); }
    ...
    }

    // --------------------------------------------------------------------
    // EQUIVALENT HAND WRITTEN CODE:
    // --------------------------------------------------------------------
    public class TestManual
    {
        public int MyInt;
        public string MyString;
        public DateTime MyDateTime;

        public override int GetHashCode()
        {
            var x = MyInt.GetHashCode();
            x *= Essence<TestEssence>.HashCodeMultiplier;
            x ^= (MyString == null) ? 0 : MyString.GetHashCode();
            x *= Essence<TestEssence>.HashCodeMultiplier;
            x ^= MyDateTime.GetHashCode();
            return x;
        }
    ...
    }

Anyway, the project, if anyone thinks is worthwhile, needs polishing, but the ideas are there... 无论如何,这个项目,如果有人认为是值得的,需要抛光,但这些想法在那里......

That looks like a solid way to do it. 这看起来像是一种可行的方式。

My only suggestion is that if you're really concerned about performance with it, you may want to add generic versions for several common cases (ie. probably 1-4 args). 我唯一的建议是,如果你真的关心它的性能,你可能想要为几种常见情况添加通用版本(即可能是1-4个args)。 That way, for those objects (which are most likely to be small, key-style composite objects), you won't have the overhead of building the array to pass to the method, the loop, any boxing of generic values, etc. The call syntax will be exactly the same, but you'll run slightly more optimized code for that case. 这样,对于那些对象(最有可能是小型的,键式复合对象),您将不会有构建数组以传递给方法,循环,任何通用值的装箱等的开销。调用语法将完全相同,但您将针对该情况运行稍微更优化的代码。 Of course, I'd run some perf tests over this before you decide whether it's worth the maintenance trade-off. 当然,在你决定是否值得维护权衡之前,我会对此进行一些性能测试。

Something like this: 像这样的东西:

public static int GetHashCodeFromFields<T1,T2,T3,T4>(this object obj, T1 obj1, T2 obj2, T3 obj3, T4 obj4) {
    int hashCode = _seedPrimeNumber;
    if(obj1 != null)
        hashCode *= _fieldPrimeNumber + obj1.GetHashCode();
    if(obj2 != null)
        hashCode *= _fieldPrimeNumber + obj2.GetHashCode();
    if(obj3 != null)
        hashCode *= _fieldPrimeNumber + obj3.GetHashCode();
    if(obj4 != null)
        hashCode *= _fieldPrimeNumber + obj4.GetHashCode();
    return hashCode;
}

I looks pretty good to me, I only have one issue: It is a shame that you have to use an object[] to pass in the values as this will box any value types you send to the function. 我看起来很不错,我只有一个问题:遗憾的是你必须使用一个object[]来传递值,因为这会将你发送给函数的任何值类型都包装起来。 I don't think you have much of a choice though, unless you go the route of creating some generic overloads like others have suggested. 我不认为你有很多选择,除非你像其他人建议的那样创建一些通用的重载。

One problem that could arise is when multiplication hits 0, final hashCode is always 0, as I just experienced with an object with a lot of properties, in the following code : 可能出现的一个问题是当乘法命中0时,最终的hashCode始终为0,正如我刚刚体验到具有大量属性的对象,在以下代码中:

hashCode *= _fieldPrimeNumber + fields[i].GetHashCode();

I'd suggest : 我建议:

hashCode = hashCode * _fieldPrimeNumber + fields[i].GetHashCode();

Or something similar with xor like this : 或用相似的XOR类似这样

hashCode = hashCode * _fieldPrimeNumber ^ fields[i].GetHashCode();

On general principle you should scope your unchecked as narrowly as you reasonably can, though it doesn't matter much here. 根据一般原则,你应该尽可能狭窄地将你unchecked范围unchecked在你的范围内,尽管这里并不重要。 Other than that, looks fine. 除此之外,看起来很好。

public override int GetHashCode() {
    return this.GetHashCodeFromFields(field1, field2, field3, this);
}

(yes, I'm very pedantic but this is the only problem that I see) (是的,我很迂腐,但这是我看到的唯一问题)

More optimal: 更优化:

  1. Create a code generator that uses reflection to look through your business object fields and creates a new partial class which overrides GetHashCode() (and Equals()). 创建一个代码生成器,它使用反射来查看业务对象字段,并创建一个新的部分类,它覆盖GetHashCode()(和Equals())。
  2. Run the code generator when your program starts up in debug mode, and if the code has changed, exit with a message to the developer to recompile. 程序在调试模式下启动时运行代码生成器,如果代码已更改,请退出并向开发人员重新编译。

The advantages of this are: 这样做的好处是:

  • Using reflection you know which fields are value types or not, and hence whether they need null checks. 使用反射,您知道哪些字段是值类型,因此是否需要空值检查。
  • There are no overheads - no extra function calls, no list construction, etc. This is important if you are doing lots of dictionary lookups. 没有开销 - 没有额外的函数调用,没有列表构造等。如果您正在进行大量的字典查找,这很重要。
  • Long implementations (in classes with lots of fields) are hidden in partial classes, away from your important business code. 长实现(在具有大量字段的类中)隐藏在部分类中,远离重要的业务代码。

Disadvantages: 缺点:

  • Overkill if you don't do lots of dictionary lookups/calls to GetHashCode(). 如果你没有对GetHashCode()进行大量的字典查找/调用,那就太过分了。

I should point out that you should almost never do allocation while implementing GetHashCode (here's some useful blog posts about it). 我应该指出,在实现GetHashCode时,你几乎不应该进行分配(这里有一些 有用的 博客文章)。

The way that params works (generating a new array on the fly) means this is really not a good general solution. params工作方式(动态生成一个新数组)意味着这不是一个好的通用解决方案。 You would be better using a method call per field and maintaiing the hash state as a variable passed to them (this makes it easy to use better hashing functions and avalanching too). 您最好使用每个字段的方法调用并将散列状态维护为传递给它们的变量(这样可以更容易地使用更好的散列函数和雪崩)。

Apart from the problems arising from using params object[] fields , I think not using the type information may be a performance issue in some situations too. 除了使用params object[] fields引起的问题之外,我认为在某些情况下,不使用类型信息也可能是性能问题。 Suppose two classes A , B have the same type and number of fields and implement the same interface I . 假设两个类AB具有相同类型和数量的字段并实现相同的接口I Now if you put A and B objects to a Dictionary<I, anything> objects with equal fields and different types will end up in the same bucket. 现在,如果将AB对象放入Dictionary<I, anything>具有相同字段和不同类型的Dictionary<I, anything>对象将最终位于同一个存储桶中。 I'd probably insert some statement like hashCode ^= GetType().GetHashCode(); 我可能会插入一些语句,如hashCode ^= GetType().GetHashCode();

Jonathan Rupp's accepted answer deals with params array but do not deal with boxing of value types. Jonathan Rupp接受的答案涉及params数组,但不处理值类型的装箱。 So, if performance is very important I'd probably declare GetHashCodeFromFields having not object but int parameters, and send not the fields themselves but the hash codes of the fields. 因此,如果性能非常重要,我可能会声明GetHashCodeFromFields没有对象而是int参数,并且不发送字段本身而是发送字段的哈希码。 ie

public override int GetHashCode() 
{
    return this.GetHashCodeFromFields(field1.GetHashCode(), field2.GetHashCode());
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM