简体   繁体   中英

Designing hashCode method Java

I'm studying Item 9, Effective Java [Always override hashcode() when you override equals].

I have a few queries regarding the points made by author :

  1. The author says:

A nonzero initial value is used in step 1 so the hash value will be affected by initial fields whose hash value, as computed in step 2.a, is zero. If zero were used as the initial value in step 1, the overall hash value would be unaffected by any such initial fields, which could increase collisions. The value 17 is arbitrary.

Step 2.a is:

For each significant field f in your object (each field taken into account by the equals method, that is), do the following: a. Compute an int hash code c for the field:

i. If the field is a boolean ,compute (f ? 1 : 0) .

ii. If the field is a byte , char , short , or int , compute (int) f .

iii. If the field is a long , compute (int) (f^ (f >>> 32)) .

iv. If the field is a float , compute Float.floatToIntBits(f) .

v. If the field is a double , compute Double.doubleToLongBits(f) , and then hash the resulting long as in step 2.a.iii.

vi. If the field is an object reference and this class's equals method compares the field by recursively invoking equals , recursively invoke hashCode on the field. If a more complex comparison is required, compute a “canonical representation” for this field and invoke hashCode on the canonical representation. If the value of the field is null , return 0 (or some other constant, but 0 is traditional).

vii. If the field is an array, treat it as if each element were a separate field. That is, compute a hash code for each significant element by applying these rules recursively, and combine these values per step 2.b. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.

Suppose result is calculated as:

result = 31 * result + areaCode;      
result = 31 * result + prefix;
result = 31 * result + lineNumber;

In case initial value of result is 0 and all given fields above are 0, result would remain 0. But, even if the result isn't 0 initially, result would amount to the same constant every time the initial fields are 0 which would be: 31*(31*(31*17)). How would this value help in decreasing collisions?

  1. The last paragraph states that :

Many classes in the Java platform libraries, such as String , Integer , and Date , include in their specifications the exact value returned by their hashCode method as a function of the instance value. This is generally not a good idea, as it severely limits your ability to improve the hash function in future releases. If you leave the details of a hash function unspecified and a flaw is found or a better hash function discovered, you can change the hash function in a subsequent release, confident that no clients depend on the exact values returned by the hash function.

What does he means by saying that the exact value returned by hashCode is a function of the instance value?

Thanks in advance for any help.

See this example:

    String a = "Abc";
    String b = "Abc";
    String c = "Pqr";
    System.out.println(" "+a.hashCode()+" "+b.hashCode()+" "+c.hashCode());

Output: 65602 65602 80497

Which clearly shows that hashCode() of string depends on values.

Extract from hashCode() documentation:
int java.lang.String.hashCode()

Returns a hash code for this string. The hash code for a String object is computed as

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)

How would this value help in decreasing collisions?

Hash collision is primarily achieved by a good distribution across the whole hash range (here the integer type).

By defining 0 as the initial value for calculating the hash result, you have a somewhat restricted distribution in a small range. Objects that differ in a minor way - maybe in some field only - produce hash codes that are not far away from each other. This makes hash collisions more likely.

By defining a non-zero initial value, you simply increase the gaps between calculated hash codes for objects that differ only in a minor way. So you better utilize the hash range and effectively make hash collisions more unlikely.

What does he means by saying that the exact value returned by hashCode is a function of the instance value?

It simply means that you should calculate the hash code by using the object's value, ie the values of its fields. You already did it in your example, and I think that you already implicitly understood it.

But: Joshua Bloch intended to say something else with this paragraph: He wanted to warn you about not documenting the exact function how the hash code is calculated. If you do so, you restrict yourself to not being able anymore to change the implementation in future releases because some users might expect a specific implementation, and you would break some code depending on yours.

The hashCode implementation in Effective Java specifically instructs you to pick a non-zero value for the initial value of result. As to your second question, hashCode is supposed to produce the same value when the internal state used for equals comparisons of the object is the same. So the fact that you'd get the same value when the instance variables are all zero fulfills the contract of hashCode. Note that the entire subheading is 'Always override hashCode when you override equals'.

For your first question, if the 2 objects are equal they are supposed to return the same hash value, this is the reason why overriding the hash method is a good idea when you override the equals method. It doesn't avoid collisions of equal objects but it does make it less likely to have collisions when objects are not equal which is more important since we want to be able to find unique objects as fast as possible.

As for your second question, I don't pretend to have much experience with designing Hash code however I believe he means that certain Objects may only return a single Hash value (for example a Singleton).

He is saying that putting that value in the documentation is a bad practice since you may want to change the hash function later, or other variables within the hash function may change at a later date changing the return value.

Either way specifying the return value or relying on a specified return value is a bad idea.

First of all I want to say a very important thing that is often not clearly formulated:

Implementing hashcode for MOST CASES is NOT IMPORTANT. It breaks down to a performance issue only. So if you have a problem with hashcode and object identity, just return -1. You will have poor performance but a solid and correct implementation. But until you have thousands of objects that make use of hashcode you won't recognize the poor performance. by the way: "Collision" looks like a significant word in the context of hashcode. Yes it is, but only if performance is really an issue. A "Collision" of hashcode-values does not mean your program does not work correctly. It means that your program may run slower. Because the access with a key to a map will cause an sequential iteration over objects with the same hashcode. In high performance environments the may be an issue. In most cases not.

But what IS IMPORTANT if you override hashcode: you have to implement it CORRECTLY. So the definition should always be satisfied: if equals returns true, hashcode should return the same value.

One other thing: while you accidentally not face problems with it, calculating hashcode on non-immutable values is a bad idea. This is because once hashcode is used, the object is placed at a special position within the "Map". If the values change the hashcode depends on, this object may be lost or it becomes hardly accessible. This WILL affect correctness of your program.

Conclusion: Do only make use of hashcode if you really need the performance. And then you should make sure you apply it correctly. It's easy make errors here but these errors may be the hardest to identify.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM