[英]What's the logic behind Python's hash function order?
As we know, Some of Python's data structures use hash tables for storing items like set
or dictionary
. 众所周知,某些Python的数据结构使用哈希表来存储set
或dictionary
等项目。 So there is no order in these objects. 因此,这些对象没有顺序。 But it seems that, for some sequences of numbers that's not true. 但是,对于某些数字序列而言,似乎并非如此。
For example consider the following examples : 例如,请考虑以下示例:
>>> set([7,2,5,3,6])
set([2, 3, 5, 6, 7])
>>> set([4,5,3,0,1,2])
set([0, 1, 2, 3, 4, 5])
But it isn't sorted if we make a small change : 但是,如果我们进行一些小的更改,则无法排序:
>>> set([8,2,5,3,6])
set([8, 2, 3, 5, 6])
So the question is: How does Python's hash function work on integer sequences? 所以问题是:Python的哈希函数如何在整数序列上工作?
Although there are a lot of questions in SO about hash
and its order,but no one of them explains the algorithm of hash function. 尽管在SO中有很多关于hash
及其顺序的问题,但是没有人解释哈希函数的算法。
So all you need here is know that how python calculate the indices in hash table. 因此,这里您所需要的就是知道python如何计算哈希表中的索引。
If you go through the hashtable.c
file in CPython source code you'll see the following lines in _Py_hashtable_set
function which shows the way python calculate the index of hash table keys : 如果您浏览CPython源代码中的hashtable.c
文件,您将在_Py_hashtable_set
函数中看到以下几行,该行显示python计算哈希表键索引的方式:
key_hash = ht->hash_func(key);
index = key_hash & (ht->num_buckets - 1);
So as the hash value of integers is the integer itself * (except for -1) the index is based on the number and the length of your data structure ( ht->num_buckets - 1
) and it calculated with Bitwise-and between (ht->num_buckets - 1)
and the number. 因此,由于整数的哈希值本身就是整数*(-1除外),因此索引基于数据结构的数量和长度( ht->num_buckets - 1
),并且按位计算-和(ht->num_buckets - 1)
和数字。
Now consider the following example with set
that use hash-table : 现在考虑以下使用hash-table的set
示例:
>>> set([0,1919,2000,3,45,33,333,5])
set([0, 33, 3, 5, 45, 333, 2000, 1919])
For number 33
we have : 对于数字33
我们有:
33 & (ht->num_buckets - 1) = 1
That actually it's : 实际上是:
'0b100001' & '0b111'= '0b1' # 1 the index of 33
Note in this case (ht->num_buckets - 1)
is 8-1=7
or 0b111
. 注意在这种情况下(ht->num_buckets - 1)
是8-1=7
或0b111
。
And for 1919
: 在1919
:
'0b11101111111' & '0b111' = '0b111' # 7 the index of 1919
And for 333
: 而对于333
:
'0b101001101' & '0b111' = '0b101' # 5 the index of 333
And as well as for the preceding examples in question : 以及上述示例:
>>> set([8,2,5,3,6])
set([8, 2, 3, 5, 6])
'0b1000' & '0b100'='0b0' # for 8
'0b110' & '0b100'='0b100' # for 8
* The hash function for class int
: *类int
的哈希函数:
class int:
def __hash__(self):
value = self
if value == -1:
value = -2
return value
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.