简体   繁体   English

长度不同的两个字符串可以具有相同的哈希码吗?

[英]Can two strings of different length have the same hashcode?

Although I am aware that two different Strings can return the same hashcode, I have been unable to find anything about two of differing lengths doing so. 尽管我知道两个不同的字符串可以返回相同的哈希码,但我无法找到有关两个不同长度的任何东西。 Is this possible, and if so, examples would be appreciated. 这是可能的,如果是这样,示例将不胜感激。 This is using the java hashcode function, in case that changes anything. 如果更改任何内容,则使用java哈希码函数。

Hashcodes are distributed over the space of an int . 哈希码分布在int的空间上。 The are only 2^32 = ~4 billion possible values for an int . 一个int仅有2^32 = ~4 billion可能值。 There are well more than that number possible strings, so by the pigeonhole principle, there must exist multiple strings with the same hash codes. 可能的字符串远远多于该数目,因此根据信鸽原则,必须存在多个具有相同哈希码的字符串。

However, this does not prove different length strings might have the same hash code, as pointed out below. 但是,这不能证明不同长度的字符串可能具有相同的哈希码,如下所述。 Java uses the formula s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] for hashing strings. Java使用公式s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]来哈希字符串。 Knowing this, it is easy to construct strings of different length that have the same hash code: 知道这一点,很容易构造具有相同哈希码的不同长度的字符串:

Let String s1 = "\\001!"; String s1 = "\\001!"; and String s2 = "@"; String s2 = "@"; . Then s1.length() != s2.length() but s1.hashCode() == '\\001' * 31 + '!' == 1 * 31 + 33 == 64 == s2.hashCode() == '@' == 64 然后s1.length() != s2.length()s1.hashCode() == '\\001' * 31 + '!' == 1 * 31 + 33 == 64 == s2.hashCode() == '@' == 64 s1.hashCode() == '\\001' * 31 + '!' == 1 * 31 + 33 == 64 == s2.hashCode() == '@' == 64 . s1.hashCode() == '\\001' * 31 + '!' == 1 * 31 + 33 == 64 == s2.hashCode() == '@' == 64

However, let me again say that there are over 4 billion possible values of an int , so your probability of collision is low, although not as low as you might think, because of the Birthday Paradox , which gives that you have about a 50% chance of a collision after about 77K hashes (assuming hashes are randomly distributed, which really depends on your data - if you mostly deal with very small length strings you will have more frequent collisions). 但是,让我再说一遍,一个int可能有超过40 亿个值,因此由于Birthday Paradox ,您发生碰撞的可能性很低,尽管没有您想像的那么低,这使您拥有大约50%的值。在大约77K哈希之后发生冲突的可能性(假设哈希是随机分布的,这实际上取决于您的数据-如果您主要处理长度非常小的字符串,则冲突频率会更高)。 Every data structure that uses hashing deals must deal with collisions, though (eg a common way is to use linked lists at each hash position), or deal with loss of data (eg in a bloom filter). 但是,每个使用哈希交易的数据结构都必须处理冲突(例如,一种常见方法是在每个哈希位置使用链接列表),或者处理数据丢失(例如,在Bloom过滤器中)。

Yes, this can happen. 是的,这可能发生。

Some rather trivial examples: 一些简单的例子:

  • initial zero-valued characters don't affect the hash-code, so (for example) "foo" , "\\0foo" , "\\0\\0foo" , etc., all have the same hash-code. 初始零值字符不会影响哈希码,因此(例如) "foo""\\0foo""\\0\\0foo"等都具有相同的哈希码。
  • each character just gets multiplied by 31 before adding the next character; 每个字符仅需乘以31,然后再添加下一个字符; so (for example) the two-character string new String(new char[] { 12, 13 }) has the same hash-code as the single-character new String(new char[] { 12 * 31 + 13 }) (where I selected 12 and 13 arbitrarily; the same works for any other values, as long as the 12 * 31 + 13 analogue stays within the two-byte-unsigned-integer range). 因此,例如,两个字符的字符串new String(new char[] { 12, 13 })与单个字符new String(new char[] { 12 * 31 + 13 })具有相同的哈希码(其中我任意选择了1213 ;只要12 * 31 + 13模拟值保持在2字节无符号整数范围内,其他任何值都可以使用相同的值)。

But those are just some easy-to-construct examples. 但是,这些只是一些易于构造的示例。 There are also plenty of pairs of strings that just happen to work out to have the same hash-code, despite no obvious relationship between them. 尽管它们之间没有明显的联系,但也有很多成对的字符串恰好具有相同的哈希码。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果 hashcode() 是根据 object 的地址创建 hashcode 的,那么两个内容相同的不同对象如何创建相同的 hashcode? - If the hashcode() creates hashcode based on the address of the object, how can two different objects with same contents create the same hashcode? 为什么具有相同数据的两个不同的 HashSet 具有相同的 HashCode? - Why do two different HashSets with the same data have the same HashCode? Java 的 hashCode 可以为不同的字符串产生相同的值吗? - Can Java's hashCode produce same value for different strings? 具有相同属性值的不同对象可以在 Java 中具有相同的哈希码吗 - Can different objects with same value for the attributes have same hashcode in Java 如果两个不同的对象具有相同的哈希码,会发生什么? - What happens if two different objects have the same hashcode? 如果两个相同的对象具有不同的哈希码,那么缺点是什么 - What is the drawback if two same objects have different hashcode 不同的地图具有相同的哈希码 - Different maps have the same hashcode 散列攻击:使用相同的hashCode()查找长度为2 ^ N的字符串 - Hash attack: find strings of length 2^N with same hashCode() 两个字符串实例看起来相同,但是它们的哈希码不同 - two string instances seems same, but their hashcode are different 两个不同的Class实例给出相同的hashCode - Two different Class instances giving same hashCode
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM