简体   繁体   English

设计一个Hashtable

[英]Design a Hashtable

I was asked this question in an Interview and was left stumped, even though i came up with an answer I didn't feel comfortable with my solution. 我在面试中被问到这个问题并且被遗忘了,尽管我想出了一个答案我对我的解决方案感到不舒服。 I wanted to see how experts here feel about this question. 我想看看这里的专家对这个问题的看法。

I am exactly quoting the question as it came out of the Interviewer. 我正在引用面试官的问题。 "Design a Hash-table, You can use any data-structure you can want. I would like to see how you implement the O(1) look up time". “设计一个哈希表,你可以使用你想要的任何数据结构。我想看看你如何实现O(1)查找时间”。 Finally he said It's more like simulating a Hash-table via another Data-structure. 最后他说,这更像是通过另一个数据结构模拟哈希表。

Can anyone light me with more information on this question. 有关这个问题的更多信息,任何人都可以点亮我。 Thanks! 谢谢!

PS: Main reason for me putting this question is to know how an expert designer would start off with the Design for this problem && one more thing I cleared the interview somehow based on the other questions that were asked but this question was in my mind and I wanted to find out the answer! PS:我提出这个问题的主要原因是要知道专家设计师如何从这个问题的设计开始&&还有一件事,我根据提出的其他问题以某种方式清除了采访,但这个问题在我脑海中,我想找出答案!

It's a fairly standard interview question that shows you understand the underlying concepts being useful Java data structures, like HashSet s and HashMap s. 这是一个相当标准的面试问题,它表明你理解了基础概念是有用的Java数据结构,比如HashSetHashMap

You would use an array of lists, these are normally referred to as buckets . 您将使用列表数组,这些列表通常称为存储桶 You start your hashtable with a given capacity n meaning you have a array of 10 lists (all empty). 您以给定容量n开始哈希表,这意味着您有一个包含10个列表的数组(全部为空)。

To add an object to your hastable you call the objects hashCode function which gives you an int (a number in a pretty big range). 要向hastable中添加一个对象,可以调用对象hashCode函数,该函数为您提供一个int(一个相当大的范围内的数字)。 So you then have to modulo the hashCode wrt to n to give you the bucket it lives in. Add the object to the end of the list in that bucket. 因此,您必须将hashCode wrt模数为n,以便为其提供存储的存储桶。将对象添加到该存储桶中列表的末尾。

To find an object you again use the hashCode and mod function to find the bucket and then need to iterate through the list using .equals() to find the correct object. 要查找对象,请再次使用hashCode和mod函数查找存储桶,然后使用.equals()遍历列表以查找正确的对象。

As the table gets fuller, you will end up doing more and more linear searching, so you will eventually need to re-hash. 随着表格越来越丰富,您最终会进行越来越多的线性搜索,因此您最终需要重新哈希。 This means building an entirely new, larger table and putting the objects into it again. 这意味着构建一个全新的,更大的表并再次将对象放入其中。

Instead of using a List in each array position you can recalulate a different bucket position if the one you want is full, a common method is quadratic probing . 如果您想要的那个位置已满,则可以重新计算不同的铲斗位置,而不是在每个阵列位置使用List,常见的方法是二次探测 This has the advantage of not needed any dynamic data structures like lists but is more complicated. 这样做的优点是不需要任何动态数据结构,如列表,但更复杂。

You need an array of lists, or "buckets" for your values. 您需要一组列表或值“桶”。 Then you use a hash function to determine which array element to look in, and finally do a linear search through the list elements there. 然后使用哈希函数确定要查看的数组元素,最后通过列表元素进行线性搜索。

You have constant lookup of the array location, and linear search of the hash values in the small list there. 您可以不断查找数组位置,并在那里的小列表中对哈希值进行线性搜索。

If I would have been in your shoes, I should have done the following: 如果我愿意,我应该做到以下几点:

  • Discuss on what exactly hashtable is and in what situations it should be used. 讨论确切的哈希表是什么以及它应该在什么情况下使用。
  • Discuss one of the implementations (for eg .net framework implementation of it) from the consumer perspective of it. 从消费者的角度讨论其中一个实现(例如.net框架实现)。
  • Discuss 'How HashTable functions internally' with the interviewer. 与面试官讨论“HashTable如何在内部运作”。 This is very important. 这是非常重要的。 You will be able to design it only if you know on how hashtable works. 只有了解哈希表的工作原理,您才能设计它。
  • Break the problem: a.Choice of Data Structure b.Choice of Hashing Function 打破问题:a。数据结构的选择b。哈希函数的选择
  • Use TDD (Test Driven Development) to design and implement HashTable class. 使用TDD(测试驱动开发)来设计和实现HashTable类。 Only implement the functionality that you were asked for. 仅实现您要求的功能。

Consider the Universe U (eg all possible IP address, or all possible names or all possible mobile numbers or all possible chess board configuration). 考虑Universe U(例如所有可能的IP地址,或所有可能的名称或所有可能的移动号码或所有可能的棋盘配置)。 You might have noticed that universe U is very large. 您可能已经注意到宇宙U非常大。

Set S is of reasonable size S⊆ U. So, this set S is of reasonable size, like you keeping phone number of your friends. Set S的大小合理S⊆U。因此,这套S的大小合理,就像你保留朋友的电话号码一样。

Selecting data-structure for implementation Without data-structure, we will not get good solution. 选择实现数据结构没有数据结构,我们将无法获得良好的解决方案。 We could use an array for quick insert, deletion, and lookup, but it will take up a lot of space,as the size of universe is very large. 我们可以使用数组进行快速插入,删除和查找,但由于Universe的大小非常大,它会占用大量空间。 Also, your friend name should be integer and space requirement is proportional to universe. 此外,您的朋友名称应为整数,空间要求与Universe成正比。

On the other hand, we could use a linked list. 另一方面,我们可以使用链表。 This would only take up as much as space as there are objects ie Set S, but the 3 operations would not be O(1). 这只会占用空间,因为有对象即Set S,但3次操作不会是O(1)。 To resolve this, we can use both. 要解决这个问题,我们可以使用两者。

So, the solution is to use the best of both worlds, ie fast lookup of arrays and small storage of size like link list. 因此,解决方案是使用两全其美,即快速查找数组和小型存储大小,如链接列表。

But, these real world entities needs to be changed to integer, by something called hash function, so that they can be used as array index. 但是,这些真实世界实体需要通过称为哈希函数的东西更改为整数,以便它们可以用作数组索引。 So, suppose you want to save your friend's name alice, just convert his name to integer 所以,假设您想保存朋友的名字alice,只需将他的名字转换为整数即可

Inserting alice: 插入爱丽丝:
int k = hashFunc(alice);
arr[k] = Alice //this takes O(1) time

Lookup for alice: 查找alice:
int k = hashFunc(alice);
string name = arr[k] ;
print name;//prints alice

Of-course it is not that simple, but this is what I can explain right now. 当然,这并不是那么简单,但这是我现在可以解释的。 Please let me know wherever I am not clear.Thanks. 如果我不清楚,请告诉我。谢谢。 For more information on hash table refer here 有关哈希表的更多信息,请参阅此处

A hash table provides a way to insert and retrieve data efficiently (usually in constant/O(1)) time. 哈希表提供了一种有效插入和检索数据的方法(通常在常量/ O(1))时间。 For this we use an very large array to store the the target values and a hash function which usually maps the target values, into hash values which is nothing else but the valid indices in this large array. 为此,我们使用一个非常大的数组来存储目标值和一个通常映射目标值的哈希函数,这些哈希值只是这个大数组中的有效索引。 A hash function which perfectly hashes a values to be stored into a unique key (or index in the table) is known as a perfect hash function. 完美地散列值以存储到唯一键(或表中的索引)的散列函数被称为完美散列函数。 But in practice to store such values for which there is no known way to obtain unique hash values (indices in the table) we usually use a hash function which can map each value to particular index so that collision can be kept minimum. 但实际上,为了存储没有已知方法来获取唯一哈希值(表中的索引)的值,我们通常使用哈希函数,该函数可以将每个值映射到特定索引,以便可以将冲突保持为最小。 Here collision means that two or more items to be stored in the hash table map to the same hash value. 这里碰撞意味着要存储在散列表中的两个或更多个项映射到相同的散列值。

Now coming at the original questions, which is: "Design a Hash-table, You can use any data-structure you can want. I would like to see how you implement the O(1) look up time". 现在来看原始问题,即:“设计一个哈希表,你可以使用你想要的任何数据结构。我想看看你如何实现O(1)查找时间”。 Finally he said It's more like simulating a Hash-table via another Data-structure." 最后他说,这更像是通过另一个数据结构模拟哈希表。“

Look up is possible in exactly O(1) time, in case we can design a perfect hash function. 如果我们可以设计一个完美的哈希函数,那么在O(1)时间内查找是可能的。 The underlying data-structure is still an array. 底层数据结构仍然是一个数组。 But it depends upon the values to be stored, whether we can design a perfect hash function or not. 但它取决于要存储的值,我们是否可以设计完美的哈希函数。 For example consider strings to English alphabet. 例如,将字符串视为英文字母。 Since there is no known hash function which can map each valid English word to a unique int (32 bit) (or long long int 64 bit) value, so there will always be some collisions. 由于没有已知的散列函数可以将每个有效的英语单词映射到唯一的int(32位)(或long long int 64 bit)值,因此总会有一些冲突。 To deal with collision we can use separate chaining method of collision handling in which each hash table slot stores a pointer to the linked list, which actually stores all the item hashing to that particular slot or index. 为了处理冲突,我们可以使用冲突处理的单独链接方法,其中每个哈希表槽存储指向链表的指针,该链表实际上存储了对该特定槽或索引的所有项哈希。 For example consider a hash function which considers each English alphabet string as a number on base 26 (because there are 26 characters in English alphabet), This can be coded as: 例如,考虑一个哈希函数,它将每个英文字母字符串视为基数为26的数字(因为英文字母中有26个字符),这可以编码为:

unsigned int hash(const std::string& word)
{
    std::transform(word.begin(), word.end(), word.begin(), ::tolower);
    unsigned int key=0;
    for(int i=0;i<word.length();++i)
    {
         key = (key<<4) + (key<<3)+(key<<2) + word[i];
         key = key% tableSize;
    }
    return key;
}

Where tableSize is an appropriately chosen prime number just greater than the total number of English dictionary words targeted to be stored in the hash table. 其中tableSize是适当选择的素数,其大于目标存储在哈希表中的英语词典单词的总数。

Following are the results with dictionary of size 144554, and table of size = 144563: 以下是字典大小为144554的结果,以及大小= 144563的表:

[Items mapping to same cell --> Number of such slots in the hash table ] =======> [映射到相同单元格的项目 - >哈希表中此类插槽的数量] =======>

[ 0  -->   53278 ]
[1 --> 52962 ]
[2 --> 26833 ]
[3 --> 8653  ]
[4 --> 2313 ]
[5 --> 437 ]
[6  --> 78 ]
[7  -->  9 ]

In this case to search the items which have been mapped to cells containing only one item, the lookup will be O(1), but in case it maps to a cell which has more than 1 items, then we have to iterate through this linked list which might contain 2 to 7 nodes and then we will be able to find out that element. 在这种情况下,要搜索已映射到仅包含一个项目的单元格的项目,查找将为O(1),但如果它映射到具有多个项目的单元格,则我们必须遍历此链接列表可能包含2到7个节点,然后我们将能够找到该元素。 So its not constant in this case. 所以在这种情况下它不是常数。

So it depends upon the availability of perfect hash function only, whether we the lookup can be performed in O(1) constraint. 所以它只取决于完美散列函数的可用性,我们是否可以在O(1)约束中执行查找。 Otherwise it will not be exactly O(1) but very close to it. 否则它不会完全是O(1)但非常接近它。

Use an array => O(1) 使用数组=> O(1)

So you'd use a hash function to turn your key into a number, then use that number as an index into an array to retrieve the value. 因此,您可以使用哈希函数将键转换为数字,然后使用该数字作为数组的索引来检索值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM