Python的哈希函数顺序背后的逻辑是什么？

Question

众所周知，某些Python的数据结构使用哈希表来存储set或dictionary等项目。 因此，这些对象没有顺序。 但是，对于某些数字序列而言，似乎并非如此。

例如，请考虑以下示例：

>>> set([7,2,5,3,6])
set([2, 3, 5, 6, 7])

>>> set([4,5,3,0,1,2])
set([0, 1, 2, 3, 4, 5])

但是，如果我们进行一些小的更改，则无法排序：

>>> set([8,2,5,3,6])
set([8, 2, 3, 5, 6])

所以问题是：Python的哈希函数如何在整数序列上工作？

Answer 1

尽管在SO中有很多关于hash及其顺序的问题，但是没有人解释哈希函数的算法。

因此，这里您所需要的就是知道python如何计算哈希表中的索引。

如果您浏览CPython源代码中的hashtable.c文件，您将在_Py_hashtable_set函数中看到以下几行，该行显示python计算哈希表键索引的方式：

key_hash = ht->hash_func(key);
index = key_hash & (ht->num_buckets - 1);

因此，由于整数的哈希值本身就是整数*（-1除外），因此索引基于数据结构的数量和长度（ ht->num_buckets - 1 ），并且按位计算-和(ht->num_buckets - 1)和数字。

现在考虑以下使用hash-table的set示例：

>>> set([0,1919,2000,3,45,33,333,5])
set([0, 33, 3, 5, 45, 333, 2000, 1919])

对于数字33我们有：

33 & (ht->num_buckets - 1) = 1

实际上是：

'0b100001' & '0b111'= '0b1' # 1 the index of 33

注意在这种情况下(ht->num_buckets - 1)是8-1=7或0b111 。

在1919 ：

'0b11101111111' & '0b111' = '0b111' # 7 the index of 1919

而对于333 ：

'0b101001101' & '0b111' = '0b101' # 5 the index of 333

以及上述示例：

>>> set([8,2,5,3,6])
set([8, 2, 3, 5, 6])

'0b1000' & '0b100'='0b0' # for 8
'0b110' & '0b100'='0b100' # for 8

_{*类int的哈希函数：}

class int:
    def __hash__(self):
        value = self
        if value == -1:
            value = -2
        return value