简体繁体中英

Can two strings of different length have the same hashcode?

原文 2016-10-06 04:53:15 0 2 java/ string/ hashcode

Although I am aware that two different Strings can return the same hashcode, I have been unable to find anything about two of differing lengths doing so. Is this possible, and if so, examples would be appreciated. This is using the java hashcode function, in case that changes anything.

2 answers

Hashcodes are distributed over the space of an int . The are only 2^32 = ~4 billion possible values for an int . There are well more than that number possible strings, so by the pigeonhole principle, there must exist multiple strings with the same hash codes.

However, this does not prove different length strings might have the same hash code, as pointed out below. Java uses the formula s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] for hashing strings. Knowing this, it is easy to construct strings of different length that have the same hash code:

Let String s1 = "\\001!"; and String s2 = "@"; . Then s1.length() != s2.length() but s1.hashCode() == '\\001' * 31 + '!' == 1 * 31 + 33 == 64 == s2.hashCode() == '@' == 64 s1.hashCode() == '\\001' * 31 + '!' == 1 * 31 + 33 == 64 == s2.hashCode() == '@' == 64 .

However, let me again say that there are over 4 billion possible values of an int , so your probability of collision is low, although not as low as you might think, because of the Birthday Paradox , which gives that you have about a 50% chance of a collision after about 77K hashes (assuming hashes are randomly distributed, which really depends on your data - if you mostly deal with very small length strings you will have more frequent collisions). Every data structure that uses hashing deals must deal with collisions, though (eg a common way is to use linked lists at each hash position), or deal with loss of data (eg in a bloom filter).

Yes, this can happen.

Some rather trivial examples:

initial zero-valued characters don't affect the hash-code, so (for example) "foo" , "\\0foo" , "\\0\\0foo" , etc., all have the same hash-code.
each character just gets multiplied by 31 before adding the next character; so (for example) the two-character string new String(new char[] { 12, 13 }) has the same hash-code as the single-character new String(new char[] { 12 * 31 + 13 }) (where I selected 12 and 13 arbitrarily; the same works for any other values, as long as the 12 * 31 + 13 analogue stays within the two-byte-unsigned-integer range).

But those are just some easy-to-construct examples. There are also plenty of pairs of strings that just happen to work out to have the same hash-code, despite no obvious relationship between them.

If the hashcode() creates hashcode based on the address of the object, how can two different objects with same contents create the same hashcode?

Why do two different HashSets with the same data have the same HashCode?

Can Java's hashCode produce same value for different strings?

Can different objects with same value for the attributes have same hashcode in Java

What happens if two different objects have the same hashcode?

What is the drawback if two same objects have different hashcode

Different maps have the same hashcode

Hash attack: find strings of length 2^N with same hashCode()

two string instances seems same, but their hashcode are different

Two different Class instances giving same hashCode

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question If the hashcode() creates hashcode based on the address of the object, how can two different objects with same contents create the same hashcode? Why do two different HashSets with the same data have the same HashCode? Can Java's hashCode produce same value for different strings? Can different objects with same value for the attributes have same hashcode in Java What happens if two different objects have the same hashcode? What is the drawback if two same objects have different hashcode Different maps have the same hashcode Hash attack: find strings of length 2^N with same hashCode() two string instances seems same, but their hashcode are different Two different Class instances giving same hashCode

Related Tags

Can two strings of different length have the same hashcode?

Question

2 answers

solution1
3 ACCPTED 2016-10-06 05:06:35

solution2
2 2016-10-06 05:16:09

Can two strings of different length have the same hashcode?

Question

2 answers

solution1 3 ACCPTED 2016-10-06 05:06:35

solution2 2 2016-10-06 05:16:09

solution1
3 ACCPTED 2016-10-06 05:06:35

solution2
2 2016-10-06 05:16:09