简体   繁体   中英

Java - How can hashCode() function output small (or negative) number when string is big

I made this function and it works the same as Original Java function when you input something short, but if i input something larger than 5-7 characters - then I get some realy big number. (And not the right hashcode)

Here is the formula of Java's hash function:

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

Simplier one (Only works for short strings):

s = "abc" //String
n = 3 //Lenght of the String
s[0] = 'a'. ASCII code of 'a' = 97.
97 * (31 ^ (n - 1))
97 * (31 ^ (2))
97 * 961 = 93217

s[1] = 'b'. ASCII code of 'b' = 98.
98 * (31 ^ (n - 2))
98 * (31 ^ 1)
98 * 31 = 3038

s[2] = 'c'. ASCII code of 'c' = 99.
99 * (31 ^ (n - 3))
99 * (31 ^ 0)
99 * 1 = 99

93217 + 3038 + 99 = 96354 //

I want to know how does Java makes hash small even when I enter a huge string.

Java's hashcode of "Hello" - 69609650
My hashcode of "Hello" - 69609650


Java's hashcode of "Welcome to Tutorialspoint.com" - 1186874997
My hashcode of "Welcome to Tutorialspoint.com" - 5.17809991536626e+43

Also how can hash be negative if we add up numbers ?

I suspect your implementation (which you haven't shown) uses BigInteger or something similar. Java just uses int - so when it overflows the range of positive 31-bit integers, it goes into large negative integers, and then as you add more (positive) values, you'll end up with small negative integers, then small positive integers, then large positive integers - and back to large negative integers.

String's hashCode involves only int addition and multiplication, so it results in an int , which may overflow (hence the negative values).

public int hashCode() {
    int h = hash;
    int len = count;
    if (h == 0 && len > 0) {
        int off = offset;
        char val[] = value;
        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return h;
}

Based on your 5.17809991536626e+43 value, it looks like you are doing floating point calculations (perhaps you are using Math.pow() which returns a double ), which give different results for large numbers.

Source code for String$hashCode() :

 1494       public int hashCode() {
 1495           int h = hash;
 1496           if (h == 0 && count > 0) {
 1497               int off = offset;
 1498               char val[] = value;
 1499               int len = count;
 1500   
 1501               for (int i = 0; i < len; i++) {
 1502                   h = 31*h + val[off++];
 1503               }
 1504               hash = h;
 1505           }
 1506           return h;
 1507       }

int is a signed integer on 4 bytes and it will just overflow during the hash computation, yielding a value that can be negative but is always bound by int .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM