简体   繁体   English

如果哈希码不同,为什么HashSet允许相等的项?

[英]Why does HashSet allow equal items if hashcodes are different?

The HashSet class has an add(Object o) method, which is not inherited from another class. HashSet类有一个add(Object o)方法,该方法不是从另一个类继承的。 The Javadoc for that method says the following: 该方法的Javadoc说明如下:

Adds the specified element to this set if it is not already present. 如果指定的元素尚不存在,则将其添加到此集合中。 More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)) . 更正式地,如果此集合不包含元素e2 (e==null ? e2==null : e.equals(e2)) ,则将指定元素e添加到此集合。 If this set already contains the element, the call leaves the set unchanged and returns false . 如果此set已包含该元素,则调用将保持set不变并返回false

In other words, if two objects are equal, then the second object will not be added and the HashSet will remain the same. 换句话说,如果两个对象相等,则不会添加第二个对象,并且HashSet将保持不变。 However, I've discovered that this is not true if objects e and e2 have different hashcodes, despite the fact that e.equals(e2) . 但是,我发现如果对象ee2具有不同的哈希码,则不是这样,尽管事实上是e.equals(e2) Here is a simple example: 这是一个简单的例子:

import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;

public class BadHashCodeClass {

    /**
     * A hashcode that will randomly return an integer, so it is unlikely to be the same
     */
    @Override
    public int hashCode(){
        return new Random().nextInt();
    }

    /**
     * An equal method that will always return true
     */
    @Override
    public boolean equals(Object o){
        return true;
    }

    public static void main(String... args){
        HashSet<BadHashCodeClass> hashSet = new HashSet<>();
        BadHashCodeClass instance = new BadHashCodeClass();
        System.out.println("Instance was added: " + hashSet.add(instance));
        System.out.println("Instance was added: " + hashSet.add(instance));
        System.out.println("Elements in hashSet: " + hashSet.size());

        Iterator<BadHashCodeClass> iterator = hashSet.iterator();
        BadHashCodeClass e = iterator.next();
        BadHashCodeClass e2 = iterator.next();
        System.out.println("Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): " + (e==null ? e2==null : e.equals(e2)));
    }

The results from the main method are: 主要方法的结果是:

Instance was added: true
Instance was added: true
Elements in hashSet: 2
Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): true

As the example above clearly shows, HashSet was able to add two elements where e.equals(e2) . 如上面的例子清楚地显示,HashSet能够在e.equals(e2)添加两个元素。

I'm going to assume that this is not a bug in Java and that there is in fact some perfectly rational explanation for why this is. 我将假设这不是 Java中的错误,实际上有一些完全合理的解释为什么会这样。 But I can't figure out what exactly. 但我无法弄清楚到底是什么。 What am I missing? 我错过了什么?

I think what you're really trying to ask is: 我想你真正要问的是:

"Why does a HashSet add objects with inequal hash codes even if they claim to be equal?" “为什么HashSet添加具有不等哈希码的对象,即使它们声称是相等的?”

The distinction between my question and the question you posted is that you're assuming this behavior is a bug, and therefore you're getting grief for coming at it from that perspective. 我的问题和你发布的问题之间的区别在于你假设这种行为是一个错误,因此从这个角度来看你会感到悲伤。 I think the other posters have done a thoroughly sufficient job of explaining why this is not a bug, however they have not addressed the underlying question. 我认为其他海报已经完全解释了为什么这不是一个错误,但是他们没有解决潜在的问题。

I will try to do so here; 我会在这里尝试这样做; I would suggest rephrasing your question to remove the accusations of poor documentation / bugs in Java so you can more directly explore why you're running into the behavior you're seeing. 我建议改写你的问题,以消除Java中糟糕的文档/错误的指控,这样你就可以更直接地探究为什么你遇到了你所看到的行为。


The equals() documentations states (emphasis added): equals()文档说明(强调添加):

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes . 请注意,通常需要在重写此方法时覆盖hashCode方法,以便维护hashCode方法的常规协定,该方法声明相等的对象必须具有相等的哈希代码

The contract between equals() and hashCode() isn't just an annoying quirk in the Java specification. equals()hashCode()之间的契约不仅仅是Java规范中令人讨厌的怪癖。 It provides some very valuable benefits in terms of algorithm optimization. 它在算法优化方面提供了一些非常有价值的好处。 By being able to assume that a.equals(b) implies a.hashCode() == b.hashCode() we can do some basic equivalence tests without needing to call equals() directly. 通过假设a.equals(b)暗示a.hashCode() == b.hashCode()我们可以做一些基本的等价测试而无需直接调用equals() In particular, the invariant above can be turned around - a.hashCode() != b.hashCode() implies a.equals(b) will be false. 特别是,上面的不变量可以转换 - a.hashCode() != b.hashCode()意味着a.equals(b)将为false。

If you look at the code for HashMap (which HashSet uses internally), you'll notice an inner static class Entry , defined like so: 如果你看一下HashMap的代码( HashSet内部使用),你会注意到一个内部静态类Entry ,定义如下:

static class Entry<K,V> implements Map.Entry<K,V> {
  final K key;
  V value;
  Entry<K,V> next;
  int hash;
  ...
}

HashMap stores the key's hash code along with the key and value. HashMap存储密钥的哈希码以及密钥和值。 Because a hash code is expected to not change over the time a key is stored in the map (see Map 's documentation, " The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map. ") it is safe for HashMap to cache this value. 由于哈希码在密钥存储在地图中的时间内不会发生变化(请参阅Map的文档,“ 如果对象的值以影响等于比较的方式更改,则不会指定地图的行为而对象是地图中的一个键。 “) HashMap可以安全地缓存此值。 By doing so, it only needs to call hashCode() once for each key in the map, as opposed to every time the key is inspected. 通过这样做,它只需要为映射中的每个键调用一次hashCode() ,而不是每次检查键时。

Now lets look at the implementation of put() , where we see these cached hashes being taken advantage of, along with the invariant above: 现在让我们看一下put()的实现,我们看到这些缓存的哈希被利用,以及上面的不变量:

public V put(K key, V value) {
  ...
  int hash = hash(key);
  int i = indexFor(hash, table.length);
  for (Entry<K,V> e = table[i]; e != null; e = e.next) {
    Object k;
    if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
      // Replace existing element and return
    }
  }
  // Insert new element
}

In particular, notice that the conditional only ever calls key.equals(k) if the hash codes are equal and the key isn't the exact same object, due to short-circuit evaluation . 特别注意,由于短路评估 ,如果哈希码相等且密钥不是完全相同的对象,则条件只调用key.equals(k) By the contract of these methods, it should be safe for HashMap to skip this call. 通过这些方法的契约, HashMap跳过此调用应该是安全的。 If your objects are incorrectly implemented, these assumptions being made by HashMap are no longer true, and you will get back unusable results, including "duplicates" in your set. 如果你的对象被错误地实现,那么HashMap做出的这些假设就不再正确,你将得到不可用的结果,包括你的集合中的“重复”。


Note that your claim " HashSet ... has an add(Object o) method, which is not inherited from another class " is not quite correct. 请注意,您的声明“ HashSet ...有一个add(Object o)方法,它不是从另一个类继承的 ”,这不太正确。 While its parent class , AbstractSet , does not implement this method, the parent interface , Set , does specify the method's contract. 虽然它的父 AbstractSet没有实现此方法,但父接口 Set确实指定了方法的契约。 The Set interface is not concerned with hashes, only equality, therefore it specifies the behavior of this method in terms of equality with (e==null ? e2==null : e.equals(e2)) . Set接口不关注哈希,只关注相等,因此它根据等式(e==null ? e2==null : e.equals(e2))指定此方法的行为。 As long as you follow the contracts, HashSet works as documented, but avoids actually doing wasteful work whenever possible. 只要您遵循合同, HashSet就会按照文档记录,但尽可能避免实际浪费的工作。 As soon as you break the rules however, HashSet cannot be expected to behave in any useful way. 但是,一旦违反规则,就不能指望HashSet以任何有用的方式运行。

Consider also that if you attempted to store objects in a TreeSet with an incorrectly implemented Comparator , you would similarly see nonsensical results. 还要考虑如果您尝试使用错误实现的ComparatorTreeSet存储对象,您同样会看到无意义的结果。 I documented some examples of how a TreeSet behaves when using an untrustworthy Comparator in another question: how to implement a comparator for StringBuffer class in Java for use in TreeSet? 我记录了一些TreeSet在另一个问题中使用不可靠的Comparator时的行为示例: 如何在Java中为StringBuffer类实现比较器以便在TreeSet中使用?

You've violated the contract of equals / hashCode basically: 你基本违反了equals / hashCode的契约:

From the hashCode() docs: 来自hashCode()文档:

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. 如果两个对象根据equals(Object)方法相等,则对两个对象中的每一个调用hashCode方法必须生成相同的整数结果。

and from equals : 并从equals

Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes. 请注意,通常需要在重写此方法时覆盖hashCode方法,以便维护hashCode方法的常规协定,该方法声明相等的对象必须具有相等的哈希代码。

HashSet relies on equals and hashCode being implemented consistently - the Hash part of the name HashSet basically implies "This class uses hashCode for efficiency purposes." HashSet依赖于equalshashCode一致地实现 - 名称HashSetHash部分基本上暗示“此类使用hashCode来提高效率”。 If the two methods are not implemented consistently, all bets are off. 如果两种方法没有一致地实施,那么所有的赌注都会被取消。

This shouldn't happen in real code, because you shouldn't be violating the contract in real code... 这不应该在实际代码中发生,因为你不应该违反实际代码中的合同......

@Override
public int hashCode(){
    return new Random().nextInt();
}

You are returning different has codes for same object every time it is evaluated. 每次评估时,您都会返回相同对象的不同代码。 Obviously you will get wrong results. 显然你会得到错误的结果。


add() function is as follows add()函数如下

public boolean add(E e) {
return map.put(e, PRESENT)==null;
}

and put() is 和put()是

public V put(K key, V value) {
    if (key == null)
        return putForNullKey(value);
    int hash = hash(key.hashCode());
    int i = indexFor(hash, table.length);
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {
        Object k;
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;
        }
    }

    modCount++;
    addEntry(hash, key, value, i);
    return null;
}

If you notice first has is calculated which is different in your case which is why object is added. 如果您注意到第一个已计算出的情况与您的情况不同,这就是添加对象的原因。 equals() comes into picture only if hash are same for objects ie collision has occured. 只有当对象的哈希值相同(即碰撞发生)时,equals()才会出现。 Since in case hash are different equals() is never executed 由于哈希值不同,因此从不执行equals()

if (e.hash == hash && ((k = e.key) == key || key.equals(k)))

Read more on what short circuiting is. 阅读更多关于短路的信息。 since e.hash == hash is false nothing else is evaluated. 由于e.hash == hash为false,因此不会评估任何其他内容。

I hope this helps. 我希望这有帮助。

because hashcode() is really implemented very badly , 因为hashcode()实际上非常糟糕

it will try to equate in each random bucket on each add() , if you return constant value from hashcode() it wouldn't let you enter any 它将尝试在每个add()上的每个随机桶中等同,如果从hashcode()返回常量值,它将不允许您输入任何

It is not required that hash codes be different for all elements! 并不要求所有元素的哈希码都不同! It is only required that two elements are not equal. 只需要两个元素不相等。

HashCode is used first to find the hash bucket the object should occupy. 首先使用HashCode来查找对象应占用的哈希桶。 If hadhcodes are different, objects are assumed to be not equal. 如果hadhcodes不同,则假定对象不相等。 If hashcodes are equal, then the equals() method is used to determine equality. 如果哈希码相等,则使用equals()方法确定相等性。 The use of hashCode is an efficiency mechanism. hashCode的使用是一种效率机制。

And... 和...
Your hash code implementation violates the contract that it should not change unless the objects identifying fields change. 您的哈希代码实现违反了它不应该更改的合同,除非标识字段的对象发生更改。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM