简体   繁体   English

什么是哈希冲突

[英]What Exactly is Hash Collision

Hash Collision or Hashing Collision in HashMap is not a new topic and I've come across several blogs and discussion boards explaining how to produce Hash Collision or how to avoid it in an ambiguous and detailed way. HashMap 中的 Hash Collision 或 Hashing Collision 并不是一个新话题,我遇到过几个博客和讨论区,它们解释了如何产生 Hash Collision 或如何以模棱两可和详细的方式避免它。 I recently came across this question in an interview.我最近在一次采访中遇到了这个问题。 I had lot of things to explain but I think it was really hard to precisely give the right explanation.我有很多事情要解释,但我认为很难准确地给出正确的解释。 Sorry if my questions are repeated here, please route me to the precise answer:抱歉,如果我的问题在这里重复,请告诉我准确的答案:

  1. What exactly is Hash Collision - is it a feature, or common phenomenon which is mistakenly done but good to avoid? Hash Collision 到底是什么——它是一个特征,还是一个误操作但可以避免的常见现象?
  2. What exactly causes Hash Collision - the bad definition of custom class' hashCode() method, OR to leave the equals() method un-overridden while imperfectly overriding the hashCode() method alone, OR is it not up to the developers and many popular java libraries also has classes which can cause Hash Collision?到底是什么导致哈希冲突-自定义类的不良定义hashCode()方法,或者离开equals()方法,同时不完全覆盖未覆盖hashCode()单独的方法,或者是达不到开发商和许多流行java 库也有可能导致哈希冲突的类?
  3. Does anything go wrong or unexpected when Hash Collision happens? Hash Collision 发生时有什么问题或意外吗? I mean is there any reason why we should avoid Hash Collision?我的意思是我们应该避免哈希冲突有什么理由吗?
  4. Does Java generate or at least try to generate unique hashCode per class during object initiation? Java 是否会在对象启动期间为每个类生成或至少尝试生成唯一的 hashCode? If no, is it right to rely on Java alone to ensure that my program would not run into Hash Collision for JRE classes?如果不是,仅依靠 Java 来确保我的程序不会遇到 JRE 类的哈希冲突是否正确? If not right, then how to avoid hash collision for hashmaps with final classes like String as key?如果不对,那么如何避免以 String 等最终类为键的哈希映射的哈希冲突?

I'll be greateful if you could please share you answers for one or all of these questions.如果您能分享您对其中一个或所有问题的答案,我将不胜感激。

What exactly is Hash Collision - is it a feature, or common phenomenon which is mistakenly done but good to avoid? Hash Collision 到底是什么——它是一个特征,还是一个误操作但可以避免的常见现象?

It's a feature.这是一个特点。 It arises out of the nature of a hashCode: a mapping from a large value space to a much smaller value space.它源于 hashCode 的性质:从大值空间到小得多的值空间的映射。 There are going to be collisions, by design and intent.根据设计和意图,将会发生冲突。

What exactly causes Hash Collision - the bad definition of custom class' hashCode() method,究竟是什么导致了哈希冲突 - 自定义类的 hashCode() 方法的错误定义,

A bad design can make it worse, but it is endemic in the notion.糟糕的设计会使情况变得更糟,但它在概念中很流行。

OR to leave the equals() method un-overridden while imperfectly overriding the hashCode() method alone,或者让 equals() 方法不被覆盖,同时不完美地单独覆盖 hashCode() 方法,

No.没有。

OR is it not up to the developers and many popular java libraries also has classes which can cause Hash Collision?或者这不取决于开发人员,许多流行的 Java 库也有可能导致哈希冲突的类?

This doesn't really make sense.这真的没有意义。 Hashes are bound to collide sooner or later, and poor algorithms can make it sooner.散列迟早会发生冲突,而糟糕的算法可能会更快。 That's about it.就是这样。

Does anything go wrong or unexpected when Hash Collision happens? Hash Collision 发生时有什么问题或意外吗?

Not if the hash table is competently written.如果哈希表编写得当,则不会。 A hash collision only means that the hashCode is not unique, which puts you into calling equals() , and the more duplicates there are the worse the performance.哈希冲突仅意味着 hashCode 不是唯一的,这使您不得不调用equals() ,并且重复越多性能越差。

I mean is there any reason why we should avoid Hash Collision?我的意思是我们应该避免哈希冲突有什么理由吗?

You have to trade off ease of computation against spread of values.您必须权衡计算的便利性和值的传播。 There is no single black and white answer.没有单一的非黑即白的答案。

Does Java generate or atleast try to generate unique hasCode per class during object initiation?在对象启动期间,Java 是否会生成或至少尝试为每个类生成唯一的 hasCode?

No. 'Unique hash code' is a contradiction in terms.不。“唯一哈希码”在术语上是矛盾的。

If no, is it right to rely on Java alone to ensure that my program would not run into Hash Collision for JRE classes?如果不是,仅依靠 Java 来确保我的程序不会遇到 JRE 类的哈希冲突是否正确? If not right, then how to avoid hash collision for hashmaps with final classes like String as key?如果不对,那么如何避免以 String 等最终类为键的哈希映射的哈希冲突?

The question is meaningless.这个问题毫无意义。 If you're using String you don't have any choice about the hashing algorithm, and you are also using a class whose hashCode has been slaved over by experts for twenty or more years.如果您使用的是String您对散列算法没有任何选择,而且您还使用了一个类,其 hashCode 已被专家控制了 20 年或更长时间。

Actually I think the hash collision is Normal.其实我认为哈希冲突是正常的。 Let talk about a case to think.讲一个案例来思考。 We have 1000000 big numbers(the set S of x), say x is in 2^64.我们有 1000000 个大数(x 的集合 S),假设 x 在 2^64 中。 And now we want to do a map for this number set.现在我们要为这个数字集做一个映射。 lets map this number set S to [0,1000000] .让我们将此数字集 S 映射到 [0,1000000] 。

But how?但是如何? use hash!!使用哈希!!

Define a hash function f(x) = x mod 1000000. And now the x in S will be converted into [0,1000000), OK, But you will find that many numbers in S will convert into one number.定义一个hash函数f(x) = x mod 1000000。现在S中的x会转化为[0,1000000),OK,但是你会发现S中的很多数都会转化为一个数。 for example.例如。 the number k * 1000000 + y will all be located in y which because (k * 1000000 + y ) % x = y.数字 k * 1000000 + y 都将位于 y 中,因为 (k * 1000000 + y ) % x = y。 So this is a hash collision.所以这是一个哈希冲突。

And how to deal with collision?以及如何处理碰撞? In this case we talked above, it is very difficult to delimiter the collision because the math computing has some posibillity.在我们上面谈到的这种情况下,由于数学计算具有一定的可能性,因此很难界定碰撞。 We can find a more complex, more good hash function, but can not definitely say we eliminate the collision.我们可以找到更复杂、更好的哈希函数,但不能肯定地说我们消除了冲突。 We should do our effort to find a more good hash function to decrease the hash collision.我们应该努力寻找更好的散列函数来减少散列冲突。 Because the hash collision increase the time cost we use hash to find something.因为哈希冲突增加了我们使用哈希来查找某些东西的时间成本。

Simplely there are two ways to deal with hash collision.简单地说,有两种方法可以处理散列冲突。 the linked list is a more direct way, for example: if two numbers above get same value after the hash_function, we create a linkedlist from this value bucket, and all the same value is put the value's linkedlist.链表是一种更直接的方式,例如:如果上面的两个数经过hash_function后得到相同的值,我们就从这个值桶创建一个链表,所有相同的值都放到该值的链表中。 And another way is that just find a new position for the later number.另一种方法是为后面的数字找到一个新位置。 for example, if number 1000005 has took the position in 5 and when 2000005 get value 5, it can not be located at position 5, it then go ahead and find a empty position to took.例如,如果数字1000005已经在5中占据了位置,当2000005得到值5时,它不能定位在位置5,然后继续寻找一个空的位置来占据。

For the last question : Does Java generate or at least try to generate unique hashCode per class during object initiation?对于最后一个问题:Java 是否会在对象启动期间为每个类生成或至少尝试生成唯一的 hashCode?

the hashcode of Object is typically implemented by converting the internal address of the object into an integer. Object 的 hashcode 通常是通过将对象的内部地址转换为整数来实现的。 So you can think different objects has different hashcode, if you use the Object's hashcode().所以你可以认为不同的对象有不同的哈希码,如果你使用对象的 hashcode()。

What exactly is Hash Collision - is it a feature, or common phenomenon which is mistakenly done but good to avoid? Hash Collision 到底是什么——它是一个特征,还是一个误操作但可以避免的常见现象?

  • a hash collision is exactly that, a collision of that field hashcode on objects...哈希冲突正是这样,该字段哈希码在对象上的冲突......

What exactly causes Hash Collision - the bad definition of custom class' hashCode() method, OR to leave the equals() method un-overridden while imperfectly overriding the hashCode() method alone, OR is it not up to the developers and many popular java libraries also has classes which can cause Hash Collision?究竟是什么导致了哈希冲突 - 自定义类的 hashCode() 方法的错误定义,或者在不完全覆盖 hashCode() 方法的同时不覆盖 equals() 方法,或者这不取决于开发人员和许多流行的方法java 库也有可能导致哈希冲突的类?

  • no, collision may happen because they are ruled by math probability and in such cases the birthday paradox is the best way to explain that.不,可能会发生碰撞,因为它们受数学概率的支配,在这种情况下,生日悖论是解释这一点的最佳方式。

Does anything go wrong or unexpected when Hash Collision happens? Hash Collision 发生时有什么问题或意外吗? I mean is there any reason why we should avoid Hash Collision?我的意思是我们应该避免哈希冲突有什么理由吗?

  • no, String class in java is very well developed class, and you dont need to search too much to find a collision (check the hascode of this Strings "Aa" and "BB" -> both have a collision to 2112)不,java中的String类是非常发达的类,你不需要搜索太多来找到冲突(检查这个字符串“Aa”和“BB”的hascode - >两者都有到2112的碰撞)

to summarize: hashcode collision is harmless is you know what is that for and why is not the same as an id used to prove equality总结一下:哈希码冲突是无害的,你知道它是什么以及为什么与用于证明相等的 id 不同

What exactly is Hash Collision - is it a feature, or common phenomenon which is mistakenly done but good to avoid? Hash Collision 到底是什么——它是一个特征,还是一个误操作但可以避免的常见现象?

Neither... both... it is a common phenomenon, but not mistakenly done, that is good to avoid.两者都……两者都……是普遍现象,但不是误操作,避免就好。

What exactly causes Hash Collision - the bad definition of custom class' hashCode() method, OR to leave the equals() method un-overridden while imperfectly overriding the hashCode() method alone, OR is it not up to the developers and many popular java libraries also has classes which can cause Hash Collision?究竟是什么导致了哈希冲突 - 自定义类的 hashCode() 方法的错误定义,或者在不完全覆盖 hashCode() 方法的同时不覆盖 equals() 方法,或者这不取决于开发人员和许多流行的方法java 库也有可能导致哈希冲突的类?

by poorly designing your hashCode() method, you can produce too many collisions, leaving you equals method un-overridden should not directly affect the number of collisions, many popular java libraries have classes that can cause collisions (nearly all classes actually).通过糟糕的 hashCode() 方法设计,你可能会产生太多的冲突,让你的 equals 方法未被覆盖应该不会直接影响冲突的数量,许多流行的 java 库都有可能导致冲突的类(实际上几乎所有的类)。

Does anything go wrong or unexpected when Hash Collision happens? Hash Collision 发生时有什么问题或意外吗? I mean is there any reason why we should avoid Hash Collision?我的意思是我们应该避免哈希冲突有什么理由吗?

There is degradation in performance, that is a reason to avoid them, but the program should continue to work.性能下降,这是避免它们的原因,但程序应该继续工作。

Does Java generate or at least try to generate unique hashCode per class during object initiation? Java 是否会在对象启动期间为每个类生成或至少尝试生成唯一的 hashCode? If no, is it right to rely on Java alone to ensure that my program would not run into Hash Collision for JRE classes?如果不是,仅依靠 Java 来确保我的程序不会遇到 JRE 类的哈希冲突是否正确? If not right, then how to avoid hash collision for hashmaps with final classes like String as key?如果不对,那么如何避免以 String 等最终类为键的哈希映射的哈希冲突?

Java doesn't try to generate a unique hash code during object initialization, but it has a default implementation of hashCode() and equals(). Java 不会在对象初始化期间尝试生成唯一的哈希码,但它具有 hashCode() 和 equals() 的默认实现。 The default implementation works to know whether two object references point to the same instance or not, and doesn't rely on the content (field values) of the objects.默认实现用于知道两个对象引用是否指向同一个实例,并且不依赖于对象的内容(字段值)。 Therefore, the String class has its own implementation.因此,String 类有自己的实现。

  1. Hash collision occurs when two separate values produce the same hash as you might know. 当两个单独的值产生相同的哈希值时,就会发生哈希冲突。 Hashes produces fixed number of characters for a given a value and therefore there is a always a possibility of two values producing the same hash despite the minute probability. 对于给定的值,哈希值产生固定数量的字符,因此尽管可能性很小,但两个值始终可能产生相同的哈希值。 So we can say the it comes with the hash function itself. 因此,我们可以说它是哈希函数本身附带的。 When using it we understand of the fact that two values may produce the same hash. 使用它时,我们了解两个值可能产生相同哈希的事实。 As hard as it is to calculate a hash collision, Google has successfully calculated a SHA-1 collision few months ago if i remember correctly. 尽管计算哈希冲突非常困难,但如果我没记错的话,Google几个月前已经成功计算出SHA-1冲突。 https://www.theregister.co.uk/2017/02/23/google_first_sha1_collision/ https://www.theregister.co.uk/2017/02/23/google_first_sha1_collision/

  2. I don't think i have knowledge on this. 我不认为我对此有知识。

  3. Yes. 是。 Suppose for some kind of function we calculate a hash to run it. 假设我们为某种函数计算了一个哈希来运行它。 So in some case if a person unknowingly produces a hash collision, that particular function will run. 因此,在某些情况下,如果某人在不知不觉中产生了哈希冲突,则该特定功能将运行。 This a might cause a defect or failure in a system. 这可能会导致系统故障或故障。

哈希冲突是指两个不同的文件产生相同的哈希

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM