简体   繁体   English

一个单元应该如何测试hashCode-equals合约?

[英]How should one unit test the hashCode-equals contract?

In a nutshell, the hashCode contract, according to Java's object.hashCode(): 简而言之,hashCode契约,根据Java的object.hashCode():

  1. The hash code shouldn't change unless something affecting equals() changes 除非影响equals()的内容发生变化,否则哈希码不应更改
  2. equals() implies hash codes are == equals()表示哈希码是==

Let's assume interest primarily in immutable data objects - their information never changes after they're constructed, so #1 is assumed to hold. 让我们主要关注不可变数据对象 - 它们的信息在构造之后永远不会改变,因此假定#1成立。 That leaves #2: the problem is simply one of confirming that equals implies hash code ==. 留下#2:问题只是确认等于隐含代码==。

Obviously, we can't test every conceivable data object unless that set is trivially small. 显然,我们无法测试每个可想到的数据对象,除非该集合很小。 So, what is the best way to write a unit test that is likely to catch the common cases? 那么,编写可能会遇到常见情况的单元测试的最佳方法是什么?

Since the instances of this class are immutable, there are limited ways to construct such an object; 由于此类的实例是不可变的,因此构造此类对象的方法有限; this unit test should cover all of them if possible. 如果可能的话,这个单元测试应该涵盖所有这些。 Off the top of my head, the entry points are the constructors, deserialization, and constructors of subclasses (which should be reducible to the constructor call problem). 在我的脑海中,入口点是构造函数,反序列化和子类的构造函数(应该可以简化为构造函数调用问题)。

[I'm going to try to answer my own question via research. [我打算通过研究来回答我自己的问题。 Input from other StackOverflowers is a welcome safety mechanism to this process.] 来自其他StackOverflowers的输入是这个过程的一个受欢迎的安全机制。]

[This could be applicable to other OO languages, so I'm adding that tag.] [这可能适用于其他OO语言,所以我添加了该标签。]

EqualsVerifier is a relatively new open source project and it does a very good job at testing the equals contract. EqualsVerifier是一个相对较新的开源项目,它在测试equals合同方面做得非常好。 It doesn't have the issues the EqualsTester from GSBase has. 它没有来自GSBase的EqualsTester存在的问题 I would definitely recommend it. 我肯定会推荐它。

My advice would be to think of why/how this might ever not hold true, and then write some unit tests which target those situations. 我的建议是考虑为什么/如何不成立,然后写一些针对这些情况的单元测试。

For example, let's say you had a custom Set class. 例如,假设您有一个自定义的Set类。 Two sets are equal if they contain the same elements, but it's possible for the underlying data structures of two equal sets to differ if those elements are stored in a different order. 如果两个集合包含相同的元素,则它们是相等的,但如果这些元素以不同的顺序存储,则两个相等集合的基础数据结构可能不同。 For example: 例如:

MySet s1 = new MySet( new String[]{"Hello", "World"} );
MySet s2 = new MySet( new String[]{"World", "Hello"} );
assertEquals(s1, s2);
assertTrue( s1.hashCode()==s2.hashCode() );

In this case, the order of the elements in the sets might affect their hash, depending on the hashing algorithm you've implemented. 在这种情况下,集合中元素的顺序可能会影响它们的散列,具体取决于您实现的散列算法。 So this is the kind of test I'd write, since it tests the case where I know it would be possible for some hashing algorithm to produce different results for two objects I've defined to be equal. 所以这就是我要写的那种测试,因为它测试的情况我知道一些散列算法可能会为我定义的两个对象产生不同的结果。

You should use a similar standard with your own custom class, whatever that is. 您应该使用与您自己的自定义类相似的标准,无论是什么。

It's worth using the junit addons for this. 值得使用junit插件。 Check out the class EqualsHashCodeTestCase http://junit-addons.sourceforge.net/ you can extend this and implement createInstance and createNotEqualInstance, this will check the equals and hashCode methods are correct. 查看类EqualsHashCodeTestCase http://junit-addons.sourceforge.net/你可以扩展这个并实现createInstance和createNotEqualInstance,这将检查equals和hashCode方法是否正确。

I would recommend the EqualsTester from GSBase. 我会建议EqualsTester从GSBase。 It does basically what you want. 它基本上是你想要的。 I have two (minor) problems with it though: 我有两个(小)问题但是:

  • The constructor does all the work, which I don't consider to be good practice. 构造函数完成所有工作,我认为这不是一个好习惯。
  • It fails when an instance of class A equals to an instance of a subclass of class A. This is not necessarily a violation of the equals contract. 当类A的实例等于类A的子类的实例时,它会失败。这不一定违反equals合同。

[At the time of this writing, three other answers were posted.] [在撰写本文时,发布了其他三个答案。]

To reiterate, the aim of my question is to find standard cases of tests to confirm that hashCode and equals are agreeing with each other. 重申一下,我的问题的目的是找到测试的标准情况,以确认hashCodeequals彼此一致。 My approach to this question is to imagine the common paths taken by programmers when writing the classes in question, namely, immutable data. 我对这个问题的处理方法是想象程序员在编写有问题的类时所采用的通用路径,即不可变数据。 For example: 例如:

  1. Wrote equals() without writing hashCode() . 写了equals()而没有编写hashCode() This often means equality was defined to mean equality of the fields of two instances. 这通常意味着将等式定义为表示两个实例的字段相等。
  2. Wrote hashCode() without writing equals() . 写了hashCode()而没有写equals() This may mean the programmer was seeking a more efficient hashing algorithm. 这可能意味着程序员正在寻求更有效的散列算法。

In the case of #2, the problem seems nonexistent to me. 在#2的情况下,我似乎不存在这个问题。 No additional instances have been made equals() , so no additional instances are required to have equal hash codes. 没有其他实例使用equals() ,因此不需要其他实例来获得相同的哈希码。 At worst, the hash algorithm may yield poorer performance for hash maps, which is outside the scope of this question. 在最坏的情况下,哈希算法可能会产生较差的哈希映射性能,这超出了本问题的范围。

In the case of #1, the standard unit test entails creating two instances of the same object with the same data passed to the constructor, and verifying equal hash codes. 在#1的情况下,标准单元测试需要创建同一对象的两个实例,并将相同的数据传递给构造函数,并验证相同的哈希码。 What about false positives? 假阳性怎么样? It's possible to pick constructor parameters that just happen to yield equal hash codes on a nonetheless unsound algorithm. 有可能选择恰好在一个不合理的算法上产生相同哈希码的构造函数参数。 A unit test that tends to avoid such parameters would fulfill the spirit of this question. 倾向于避免这些参数的单元测试将满足这个问题的精神。 The shortcut here is to inspect the source code for equals() , think hard, and write a test based on that, but while this may be necessary in some cases, there may also be common tests that catch common problems - and such tests also fulfill the spirit of this question. 这里的捷径是检查equals()的源代码,仔细思考,并根据它编写测试,但在某些情况下这可能是必要的,也可能有常见的测试可以捕获常见的问题 - 而且这样的测试也是实现这个问题的精神。

For example, if the class to be tested (call it Data) has a constructor that takes a String, and instances constructed from Strings that are equals() yielded instances that were equals() , then a good test would probably test: 例如,如果要测试的类(称之为数据)有一个构造函数,需要一个String,并从该字符串是构建实例equals()得到那名实例equals() ,然后一个很好的测试将可能测试:

  • new Data("foo")
  • another new Data("foo") 另一个new Data("foo")

We could even check the hash code for new Data(new String("foo")) , to force the String to not be interned, although that's more likely to yield a correct hash code than Data.equals() is to yield a correct result, in my opinion. 我们甚至可以检查new Data(new String("foo"))的哈希代码new Data(new String("foo")) ,以强制String不被实现,尽管这更可能产生正确的哈希代码而不是Data.equals()产生正确的结果,在我看来。

Eli Courtwright's answer is an example of thinking hard of a way to break the hash algorithm based on knowledge of the equals specification. 伊莱Courtwright的回答是一种思维方式,打破基础上的知识散列算法硬的例子equals规范。 The example of a special collection is a good one, as user-made Collection s do turn up at times, and are quite prone to muckups in the hash algorithm. 特殊集合的例子很好,因为用户自己的Collection有时会出现,并且很容易在哈希算法中出现问题。

This is one of the only cases where I would have multiple asserts in a test. 这是我在测试中有多个断言的唯一情况之一。 Since you need to test the equals method you should also check the hashCode method at the same time. 由于您需要测试equals方法,因此您还应该同时检查hashCode方法。 So on each of your equals method test cases check the hashCode contract as well. 因此,在每个equals方法测试用例中也检查hashCode合约。

A one = new A(...);
A two = new A(...);
assertEquals("These should be equal", one, two);
int oneCode = one.hashCode();
assertEquals("HashCodes should be equal", oneCode, two.hashCode());
assertEquals("HashCode should not change", oneCode, one.hashCode());

And of course checking for a good hashCode is another exercise. 当然,检查一个好的hashCode是另一个练习。 Honestly I wouldn't bother to do the double check to make sure the hashCode wasn't changing in the same run, that sort of problem is better handled by catching it in a code review and helping the developer understand why that's not a good way to write hashCode methods. 老实说,我不打算进行双重检查以确保hashCode在同一次运行中没有改变,通过在代码审查中捕获它并帮助开发人员理解为什么这不是一个好方法可以更好地处理这种问题编写hashCode方法。

If I have a class Thing , as most others do I write a class ThingTest , which holds all the unit tests for that class. 如果我有一个类Thing ,就像大多数其他人一样,我会写一个类ThingTest ,它包含该类的所有单元测试。 Each ThingTest has a method 每个ThingTest都有一个方法

 public static void checkInvariants(final Thing thing) {
    ...
 }

and if the Thing class overrides hashCode and equals it has a method 如果Thing类重写hashCode并且等于它有一个方法

 public static void checkInvariants(final Thing thing1, Thing thing2) {
    ObjectTest.checkInvariants(thing1, thing2);
    ... invariants that are specific to Thing
 }

That method is responsible for checking all invariants that are designed to hold between any pair of Thing objects. 该方法负责检查旨在保存在任何Thing对象之间的所有不变量。 The ObjectTest method it delegates to is responsible for checking all invariants that must hold between any pair of objects. 它委托给它的ObjectTest方法负责检查必须在任何一对对象之间保存的所有不变量。 As equals and hashCode are methods of all objects, that method checks that hashCode and equals are consistent. 由于equalshashCode是所有对象的方法,因此该方法检查hashCodeequals是否一致。

I then have some test methods that create pairs of Thing objects, and pass them to the pairwise checkInvariants method. 然后我有一些测试方法创建Thing对象对,并将它们传递给成对checkInvariants方法。 I use equivalence partitioning to decide what pairs are worth testing. 我使用等价分区来决定哪些对值得测试。 I usually create each pair to be different in only one attribute, plus a test that tests two equivalent objects. 我通常只在一个属性中创建每个对,以及测试两个等效对象的测试。

I also sometimes have a 3 argument checkInvariants method, although I finds that is less useful in findinf defects, so I do not do this often 我有时也有一个3参数checkInvariants方法,虽然我发现在findinf缺陷中没那么有用,所以我不经常这样做

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM