简体   繁体   English

Hashmap哈希码到内部表索引的转换

[英]Hashmap hashcode to internal table index conversion

Hashmaps usually implemented using internal array (table) of buckets. 哈希图通常使用存储桶的内部数组(表)来实现。 On accessing hashmap by key, we get key's hashcode using key-type specific(logic type specific) hash function. 通过键访问哈希图时,我们使用特定于键类型的(特定于逻辑类型的)哈希函数来获取键的哈希码。 Then we need to map hashcode to actual internal buckets table index. 然后,我们需要将哈希码映射到实际的内部存储桶表索引。

 key -> (hash function) -> hashcode -> (???) -> index in internal table

Sometimes internal table could shrink and expand, depending on hashmap filling ratio. 有时内部表可能会缩小和扩展,具体取决于哈希图的填充率。 Then probably hashcode->index conversion method could be changed a bit. 然后,可能可以对hashcode-> index转换方法进行一些更改。

For example our hash function returns 32 bit unsigned integer value and 例如,我们的哈希函数返回32位无符号整数值,并且

moment A: internal table has capacity 10000 时刻A:内部表的容量为10000

moment B: internal table has capacity 100000 时刻B:内部表的容量为100000

What algorithms or approach usually used to perform hashcode->internal table index conversion? 通常使用什么算法或方法执行哈希码->内部表索引转换? How is table resizing isue solved for them? 如何为他们解决表大小调整问题?

Usually, a simple modulo will do the job. 通常,一个简单的模就可以完成工作。

To take a quick example from Wikipedia , it's simple as that : Wikipedia为例,它很简单:

hash = hashfunc(key)
index = hash % array_size

As you said, the resizing happen dependending on the hashmap filling ratio. 如您所说,调整大小取决于哈希图填充率。 The array is reallocated (see realloc() ), then the indices are recalculated given the new array size, and the values copied to their new index. 重新分配数组(请参阅realloc() ),然后在给定新数组大小的情况下重新计算索引,并将值复制到其新索引。

I wrote about this here and here . 在这里这里都写过这个。

When you increase the size of your vector of indeces you can be sure that the algorithm that worked well on the shorter vector will work less well on the longer. 当您增加索引的矢量的大小时,可以确保在较短的矢量上运行良好的算法在较长的矢量上运行较差。 It is possible to test beforehand and have new algorithms to put in place when you make the vector longer. 当您将向量加长时,可以预先进行测试并使用新的算法。 Or, as the the number of occupied indeces in the current vector increases, have a background, lower-priority thread that tests different algorithms on the data. 或者,随着当前向量中占用索引的数量增加,请使用背景优先级较低的线程来测试数据上的不同算法。

As the example in one of my answers shows, a "new algorithm" need be nothing more than a different pair of matched prime numbers. 正如我的一个答案中的示例所示,“新算法”只不过是一对不同的匹配素数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM