简体   繁体   中英

Can two strings of different length have the same hashcode?

Although I am aware that two different Strings can return the same hashcode, I have been unable to find anything about two of differing lengths doing so. Is this possible, and if so, examples would be appreciated. This is using the java hashcode function, in case that changes anything.

Hashcodes are distributed over the space of an int . The are only 2^32 = ~4 billion possible values for an int . There are well more than that number possible strings, so by the pigeonhole principle, there must exist multiple strings with the same hash codes.

However, this does not prove different length strings might have the same hash code, as pointed out below. Java uses the formula s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] for hashing strings. Knowing this, it is easy to construct strings of different length that have the same hash code:

Let String s1 = "\\001!"; and String s2 = "@"; . Then s1.length() != s2.length() but s1.hashCode() == '\\001' * 31 + '!' == 1 * 31 + 33 == 64 == s2.hashCode() == '@' == 64 s1.hashCode() == '\\001' * 31 + '!' == 1 * 31 + 33 == 64 == s2.hashCode() == '@' == 64 .

However, let me again say that there are over 4 billion possible values of an int , so your probability of collision is low, although not as low as you might think, because of the Birthday Paradox , which gives that you have about a 50% chance of a collision after about 77K hashes (assuming hashes are randomly distributed, which really depends on your data - if you mostly deal with very small length strings you will have more frequent collisions). Every data structure that uses hashing deals must deal with collisions, though (eg a common way is to use linked lists at each hash position), or deal with loss of data (eg in a bloom filter).

Yes, this can happen.

Some rather trivial examples:

  • initial zero-valued characters don't affect the hash-code, so (for example) "foo" , "\\0foo" , "\\0\\0foo" , etc., all have the same hash-code.
  • each character just gets multiplied by 31 before adding the next character; so (for example) the two-character string new String(new char[] { 12, 13 }) has the same hash-code as the single-character new String(new char[] { 12 * 31 + 13 }) (where I selected 12 and 13 arbitrarily; the same works for any other values, as long as the 12 * 31 + 13 analogue stays within the two-byte-unsigned-integer range).

But those are just some easy-to-construct examples. There are also plenty of pairs of strings that just happen to work out to have the same hash-code, despite no obvious relationship between them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM