简体   繁体   English

当hashcode()实现返回常量值时,为什么哈希表会退化为链表?

[英]Why does a hashtable degenerate into a Linked List when a hashcode() implementation returns a constant value?

// The worst possible legal hash function - never use!
@Override public int hashCode() { return 42; }

It's legal because it ensures that equal objects have the same hash code. 这是合法的,因为它确保了相等的对象具有相同的哈希码。 It's atrocious because it ensures that every object has the same hash code. 这很糟糕,因为它确保每个对象都具有相同的哈希码。 Therefore, every object hashes to the same bucket, and hash tables degenerate to linked lists. 因此,每个对象都会散列到同一个存储桶,并且散列表会退化为链接列表。 Programs that should run in linear time instead run in quadratic time. 应该以线性时间运行的程序改为以二次方运行。

Am trying to figure the above out (quote from pg 47, Item 9, Joshua Bloch's Effective Java). 我试图弄清楚如何(引自第47页,第9项,Joshua Bloch的Effective Java)。

The way I see it is as follows (consider the following code): 我看到它的方式如下(考虑以下代码):

Map<String, String> h = new HashMap<String,String>();
h.put("key1", "value1");
h.put("key1", "value2");

What happens with the second h.put("key1",...) command is as follows: 1. Get the hashcode of key1 2. Get to the bucket representing the above hashcode 3. Within that bucket, for each object, invoke the equals method to find whether an identical object exists. 第二个h.put("key1",...)命令发生的情况如下:1。获取key1的哈希码2.获取代表上述哈希码的桶3.在该桶中,为每个对象调用equals方法,用于查找是否存在相同的对象。

This is kind of faster, because first you find the 'group' (bucket) of objects and then the actual object. 这有点快,因为首先你找到对象的“组”(桶),然后找到实际的对象。

Now, when the hashcode implementation is such that it returns the same integer (such as 42 above) for ALL objects, then there is only one bucket, and the equals method needs to be invoked one-by-one on each object in the entire hashmap/hashtable. 现在,当hashcode实现为ALL对象返回相同的整数(例如42以上)时,只有一个桶,并且需要在整个对象上逐个调用equals方法HashMap中/哈希表。 This is as bad as a linked list because if the objects where in a linked list, then too, one would have to go through them one by one comparing (calling equals) each object. 这与链表一样糟糕,因为如果链表中的对象也是如此,则必须逐个比较(调用equals)每个对象。

Is that why, it was said, that the hashtables degenerate into a linked list ? 有人说,这就是哈希表退化为链表的原因吗?

(I apologize for the verbosity of the above text. I am not clear enough in my concepts to have stated it more succinctly) (我为上述文本的冗长而道歉。我的概念中我不够清楚地说明它更简洁)

Yes, your understanding seems accurate. 是的,你的理解似乎是准确的。 However, it is not like a linked list. 但是,它不像链接列表。 The actual internal implementation of entries that share a common bucket is a plain old linked list. 共享一个公共存储桶的条目的实际内部实现一个普通的旧链表。 The bucket holds the Map.Entry at the head of the list and each entry has a forward pointer to the next occupant of its bucket. 存储桶将Map.Entry保存在列表的开头,每个条目都有一个指向其存储桶下一个占用者的前向指针。 (For the implementation of HashMap that's built into Java of course.) (当然,为了实现内置于Java中的HashMap。)

HashTable is an array with mapping function (hashCode). HashTable是一个具有映射功能(hashCode)的数组。 When inserting into the array you calculate the position and insert the element there. 插入数组时,您可以计算位置并在此处插入元素。

BUT, the hashCode does not guarantee, that every element will have a different position, so some objects might collide (have the same address) and the hashTable has to resolve it. 但是,hashCode不保证每个元素都有不同的位置,因此一些对象可能会发生碰撞(具有相同的地址),而hashTable必须解决它。 There are two common approaches, how to do that. 有两种常见的方法,如何做到这一点。

Separate chaining 单独链接

In separate chaining (used in Java) every index of the array contains a linked list - so every bucket (position) has a infinite capacity. 在单独的链接(在Java中使用)中,数组的每个索引都包含一个链表 - 因此每个存储桶(位置)都具有无限容量。 Hence if your hashCode returns only one value, you are using only one liked list => hashTable is a linked list. 因此,如果你的hashCode只返回一个值,你只使用一个like list => hashTable是一个链表。

Linear probing 线性探测

Second approach is a linear probing. 第二种方法是线性探测。 In linear probing the inner array is really normal array of elements. 在线性探测中,内部数组实际上是正常的元素数组。 When you find out, that the position is already occupied, you iterate over the array and place the new element at the first empty position. 当您发现该位置已被占用时,您将迭代数组并将新元素放在第一个空位置。

So I your impl of hashCode generates contant value for every element, you are generating only colisions, hence you are trying to place all the elements to the same index and because is always occupied, you iterate over all aready placed elements and place the new element at the end of this structure . 所以我你的hashCode的impl为每个元素生成了一个含有的值,你只生成了colisions,因此你试图将所有元素放在同一个索引上,因为它总是被占用,你迭代所有放置的元素并放置新元素在this structure的最后。 If you read again, what you are doing, you must see, that you are using only a different (you can say implicit) implementation of a linked list. 如果你再读一遍,你在做什么,你必须看到,你只使用链表的另一个(你可以说是隐含的)实现。

Why not to do it 为什么不这样做

You really should not return constant values, because hashtables are built to provide O(1) expected complexity of search and insert operations (because of the hash function, which returns a different address for (almost) every different object). 你真的不应该返回常量值,因为哈希表是为了提供O(1)预期的搜索和插入操作的复杂性而构建的(因为哈希函数为(几乎)每个不同的对象返回一个不同的地址)。 If you return only one value, the implementation degrades to linked list with O(n) for both operations. 如果只返回一个值,则对于两个操作,实现都会降级为链接列表,其中包含O(n)

Hash tables -- when used correctly -- offer constant-time lookups on average. 哈希表 - 如果使用正确 - 平均提供常量时间查找。 In terms of time complexity, constant time is as good as it gets. 就时间复杂性而言,恒定时间和它一样好。

Linked lists offer linear-time lookups. 链接列表提供线性时间查找。 Linear time (ie look at every element in turn) is as bad as it gets. 线性时间(即依次查看每个元素)和它一样糟糕。

When a hash table is misused in the manner described by Bloch, its lookup behaviour degenerates into that of a linked list, simply because it effectively becomes a linked list. 当哈希表以Bloch描述的方式被滥用时,其查找行为退化为链表的行为,仅仅因为它实际上变成了链表。

Similar things can be said about other operations. 关于其他操作可以说类似的事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么this.hashCode()和super.hashCode()在子类中返回相同的值? - why this.hashCode() and super.hashCode() returns the same value in subclass? HasMap的实现是否将键值对存储在链表中 - Does the implementation of HasMap store key-value pairs in a linked list 如果hashcode返回一个常量值,则在HashSet.contains()的情况下调用hashCode()和equals()的次数 - Number of calls of hashCode() and equals() in case of HashSet.contains() if hashcode returns a constant value 为什么这是`hashCode()`的良好实现 - why is this a good implementation of `hashCode()` 为什么 JavaScript 中没有链表的实现? - Why is there not an implementation of a linked list in JavaScript? Java Hashtable链表 - Java Hashtable linked list 在HashMap和HashTable中计算hashCode的工作方式是否不同? - Does computing hashCode in HashMap and HashTable work differently? JPA实体是否应为hashcode()实现返回常量? - Should JPA entities return a constant for the hashcode() implementation? Java链表上队列实现的线性时间和恒定时间 - about linear time and constant time of queue implementation on java linked list 为什么hashCode()在所有连续执行中为对象返回相同的值? - Why hashCode() returns the same value for a object in all consecutive executions?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM