简体   繁体   English

Python的哈希函数顺序背后的逻辑是什么?

[英]What's the logic behind Python's hash function order?

As we know, Some of Python's data structures use hash tables for storing items like set or dictionary . 众所周知,某些Python的数据结构使用哈希表来存储setdictionary等项目。 So there is no order in these objects. 因此,这些对象没有顺序。 But it seems that, for some sequences of numbers that's not true. 但是,对于某些数字序列而言,似乎并非如此。

For example consider the following examples : 例如,请考虑以下示例:

>>> set([7,2,5,3,6])
set([2, 3, 5, 6, 7])

>>> set([4,5,3,0,1,2])
set([0, 1, 2, 3, 4, 5])

But it isn't sorted if we make a small change : 但是,如果我们进行一些小的更改,则无法排序:

>>> set([8,2,5,3,6])
set([8, 2, 3, 5, 6])

So the question is: How does Python's hash function work on integer sequences? 所以问题是:Python的哈希函数如何在整数序列上工作?

Although there are a lot of questions in SO about hash and its order,but no one of them explains the algorithm of hash function. 尽管在SO中有很多关于hash及其顺序的问题,但是没有人解释哈希函数的算法。

So all you need here is know that how python calculate the indices in hash table. 因此,这里您所需要的就是知道python如何计算哈希表中的索引。

If you go through the hashtable.c file in CPython source code you'll see the following lines in _Py_hashtable_set function which shows the way python calculate the index of hash table keys : 如果您浏览CPython源代码中的hashtable.c文件,您将在_Py_hashtable_set函数中看到以下几行,该行显示python计算哈希表键索引的方式:

key_hash = ht->hash_func(key);
index = key_hash & (ht->num_buckets - 1);

So as the hash value of integers is the integer itself * (except for -1) the index is based on the number and the length of your data structure ( ht->num_buckets - 1 ) and it calculated with Bitwise-and between (ht->num_buckets - 1) and the number. 因此,由于整数的哈希值本身就是整数*(-1除外),因此索引基于数据结构的数量和长度( ht->num_buckets - 1 ),并且按位计算-和(ht->num_buckets - 1)和数字。

Now consider the following example with set that use hash-table : 现在考虑以下使用hash-table的set示例:

>>> set([0,1919,2000,3,45,33,333,5])
set([0, 33, 3, 5, 45, 333, 2000, 1919])

For number 33 we have : 对于数字33我们有:

33 & (ht->num_buckets - 1) = 1

That actually it's : 实际上是:

'0b100001' & '0b111'= '0b1' # 1 the index of 33

Note in this case (ht->num_buckets - 1) is 8-1=7 or 0b111 . 注意在这种情况下(ht->num_buckets - 1)8-1=70b111

And for 1919 : 1919

'0b11101111111' & '0b111' = '0b111' # 7 the index of 1919

And for 333 : 而对于333

'0b101001101' & '0b111' = '0b101' # 5 the index of 333

And as well as for the preceding examples in question : 以及上述示例:

>>> set([8,2,5,3,6])
set([8, 2, 3, 5, 6])

'0b1000' & '0b100'='0b0' # for 8
'0b110' & '0b100'='0b100' # for 8

* The hash function for class int : *类int的哈希函数:

class int:
    def __hash__(self):
        value = self
        if value == -1:
            value = -2
        return value

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM