简体   繁体   中英

Condition for inserting an object in HashSet?

Please bear with me as i am trying to introduce a new concept in direct contradiction with many active threads.

What is the condition for inserting an object in HashSet?

Looking at the source code, it zeroes in to :

if (e.hash == hash && ((k = e.key) == key || key.equals(k)))

Full code at: HashSet.java

So, it depends upon

  1. Hashcode
  2. equals()
  3. == ie if they are same objects.

Now, we know that hashcode of two objects has to be same if obj1.equals(obj2) returns true. Based on the relative values of these 3 parameters i have created the following table : HashCode添加条件

Look at condition no. 4. Despite equals() returns false the object gets added to the HashSet. In all the other cases the object simply gets added if and only if equals() returns false. So, one could say (ignoring condition number 4) that the decision whether an object will be added to the HashSet or not is taken simply by the equals() method. On being asked why do we use hashCode() the standard reply is that it improves the performance by simply comparing integers as the short-circuit operator saves the execution of equals() method. This argument is discussed in many threads like Why do we check hash if we are going to check equals anyways?

However, i find this argument to be incorrect. Hashcode actually has a decision to take if equals() returns false and == returns true. It's highly unlikely because same object usually return true for equals(), until someone explicitly (violating the equals() contract) overrides the equals method such that it returns different values for same object. Still, its a possibility and java appears to be providing a risk management in case of some defaulter code. Your take !

HashSet requires that objects passed to it obey the contract for hashCode and equals — if they don't, then garbage-in-garbage-out. The contract for equals states that if two references are == , they must be equal. So your condition 4 above is one that violates the contract for equality, and thus violates the contract for HashSet , and thus HashSet isn't obligated to act meaningfully when presented such a set of conditions.

Condition 5 also breaks the contract.

Contract of equals()

If two references are equal or same (==), then equals() should return true.

Contract of hashCode()

If equals() method return true for two objects, then hashCode() also needs to return the same hash value for those two objects.

Truth table

So let's consider the truth-table of 8 scenarios, and there are only 4 valid scenarios as shown below.

| hashCode() | equals() |   ==   | add() |
| not-same   | false    | false  | true  |
| not-same   | false    | true   |   -   | - INVALID scenario (== vs equals)
| not-same   | true     | false  |   -   | - INVALID scenario (hash vs equals)
| not-same   | true     | true   |   -   | - INVALID scenario (hash vs equals)
| same       | false    | false  | true  |
| same       | false    | true   |   -   | - INVALID scenario (== vs equals)
| same       | true     | false  | false | 
| same       | true     | true   | false |

In the table of the question; S.No 4 & 5 are invalid due to == vs equals() contract.

Your truth table is incomplete. It should have eight rows, as follows:

# HashCode Equals  ==    add()
- -------- ------ ------ -----
1   same     TRUE  TRUE  FALSE
2   same     TRUE FALSE  FALSE
3   same    FALSE FALSE   TRUE
4   diff    FALSE FALSE   TRUE
======= ILLEGAL ROWS =========
5   diff     TRUE  TRUE   TRUE -- Breaks the contract of hashCode, which must
                               -- return the same value on multiple calls
6   diff     TRUE FALSE   TRUE -- Breaks the contract of hashCode
7   same    FALSE  TRUE  FALSE -- Breaks the contract of equals
8   diff    FALSE  TRUE  FALSE -- Breaks the contract of equals

Row #5 represents a situation when hashCode returns different values when you call it several times (this is an extremely bad thing, but it may occasionally happen when the object is mutable).

Row #6 represents a situation when two equal items have different hashCode - a violation of hashCode contract.

The last two rows, #7 and #8, are illegal, because they breaks the requirement of equals() to be reflexive (ie x.equals(x) must return true for all non-null x ).

Rows #4 and #5 from your table represent illegal state. HashSet would never find out, though, because the first clause of OR is a mere optimization. Due to short-circuiting, there would be no call to equals when == evaluates to true , so HashSet effectively assumes the reflexivity of equals , even if the implementation is wrong.

if (e.hash == hash && ((k = e.key) == key || key.equals(k)))

e.hash == hash is present in this condition for reasons of efficiency, it is (under sane circumstances) the quickest test to perform and is used to discount equality at the first hurdle. In all cases where the program is in a valid state (not violating the == .equals() .hashCode() contract) it has no logical effect on the end result of the if statement.

Conditions which are a result of breaking the == .equals() .hashCode() contract are not considered because such a program is in an invalid state and behaviour is not defined. Effects under a broken contract are likely to change from implimentation to implimentation and so should never be relied upon.

You aren't addressing the order of operations. The real table will include DC (ie Don't Care) because they will not be evaluated.

If we add when the following is FALSE

if (e.hash == hash && ((k = e.key) == key || key.equals(k)))

Then

hash==  && (key== || equals()) >> add?
---------------------------------------
   T     |    T    |   DC     ||  F
   F     |    DC   |   DC     ||  T
   T     |    F    |   T      ||  F  <- Not possible with correct code
   T     |    F    |   F      ||  F  <- Not possible with correct code

If either the hash or equals functions are incorrect, then none of this matters.

All you need Is Valid condition then you can directly go for this .

if(!Obj1.equals(Obj2)) add() ;

As you can see there are 8 cases possible , out of which 4 are only valid lets proceed with them .

╔════════════╦══════════╦═══════╦═══════╗
║ hashCode() ║ equals() ║  ==   ║ add() ║
╠════════════╬══════════╬═══════╬═══════╣
║ not-same   ║ false    ║ false ║ true  ║
║ same       ║ false    ║ false ║ true  ║
║ same       ║ true     ║ false ║ false ║
║ same       ║ true     ║ true  ║ false ║
╚════════════╩══════════╩═══════╩═══════╝

So now its clear we add only when equals() is false .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM