[英]Why does Python's hash of infinity have the digits of π?
Summary: It's not a coincidence;总结:这不是巧合;
_PyHASH_INF
is hardcoded as 314159 in the default CPython implementation of Python, and was picked as an arbitrary value (obviously from the digits of π) by Tim Peters in 2000 . _PyHASH_INF
在 Python 的默认 CPython 实现中被硬编码为 314159 ,并在 2000 年被 Tim Peters选择为任意值(显然来自 π 的数字)。
The value of hash(float('inf'))
is one of the system-dependent parameters of the built-in hash function for numeric types, and is also available as sys.hash_info.inf
in Python 3: hash(float('inf'))
是数字类型的内置哈希函数的系统相关参数之一,在 Python 3 中也可以作为sys.hash_info.inf
使用:
>>> import sys
>>> sys.hash_info
sys.hash_info(width=64, modulus=2305843009213693951, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0)
>>> sys.hash_info.inf
314159
(Same results with PyPy too.) ( 与 PyPy 的结果相同。)
In terms of code, hash
is a built-in function.就代码而言,
hash
是一个内置函数。 Calling it on a Python float object invokes the function whose pointer is given by the tp_hash
attribute of the built-in float type ( PyTypeObject PyFloat_Type
), which is the float_hash
function, defined as return _Py_HashDouble(v->ob_fval)
, which in turn has它调用一个Python浮动物体上就会调用其指针由给定的功能
tp_hash
属性内置浮子式(的PyTypeObject PyFloat_Type
),其是所述float_hash
功能, 定义为return _Py_HashDouble(v->ob_fval)
这反过来又已
if (Py_IS_INFINITY(v))
return v > 0 ? _PyHASH_INF : -_PyHASH_INF;
where _PyHASH_INF
is defined as 314159:其中
_PyHASH_INF
定义为314159:
#define _PyHASH_INF 314159
In terms of history, the first mention of 314159
in this context in the Python code (you can find this with git bisect
or git log -S 314159 -p
) was added by Tim Peters in August 2000, in what is now commit 39dce293 in the cpython
git repository.就历史而言, Tim Peters在 2000 年 8 月在 Python 代码(您可以使用
git bisect
或git log -S 314159 -p
找到它)中在此上下文中首次提及314159
,现在提交39dce293 cpython
git 存储库。
The commit message says:提交消息说:
Fix for http://sourceforge.net/bugs/?func=detailbug&bug_id=111866&group_id=5470 .
修复http://sourceforge.net/bugs/?func=detailbug&bug_id=111866&group_id=5470 。 This was a misleading bug -- the true "bug" was that
hash(x)
gave an error return whenx
is an infinity.这是一个误导性的错误——真正的“错误”是当
x
是无穷大时hash(x)
给出了错误返回。 Fixed that.修正了那个。 Added new
Py_IS_INFINITY
macro topyport.h
.向
pyport.h
添加了新的Py_IS_INFINITY
宏。 Rearranged code to reduce growing duplication in hashing of float and complex numbers, pushing Trent's earlier stab at that to a logical conclusion.重新排列代码以减少浮点数和复数散列中不断增加的重复,将 Trent 早先的观点推向一个合乎逻辑的结论。 Fixed exceedingly rare bug where hashing of floats could return -1 even if there wasn't an error (didn't waste time trying to construct a test case, it was simply obvious from the code that it could happen).
修复了极其罕见的错误,即使没有错误,浮点数的散列也可能返回 -1(没有浪费时间尝试构建测试用例,从代码中很明显它可能发生)。 Improved complex hash so that
hash(complex(x, y))
doesn't systematically equalhash(complex(y, x))
anymore.改进了复杂散列,以便
hash(complex(x, y))
不再系统地等于hash(complex(y, x))
。
In particular, in this commit he ripped out the code of static long float_hash(PyFloatObject *v)
in Objects/floatobject.c
and made it just return _Py_HashDouble(v->ob_fval);
特别是,在这次提交中,他删除了
Objects/floatobject.c
中static long float_hash(PyFloatObject *v)
的代码,并使其只return _Py_HashDouble(v->ob_fval);
, and in the definition of long _Py_HashDouble(double v)
in Objects/object.c
he added the lines: , 在
Objects/object.c
中long _Py_HashDouble(double v)
的定义中,他添加了以下Objects/object.c
行:
if (Py_IS_INFINITY(intpart))
/* can't convert to long int -- arbitrary */
v = v < 0 ? -271828.0 : 314159.0;
So as mentioned, it was an arbitrary choice.如前所述,这是一个任意选择。 Note that 271828 is formed from the first few decimal digits of e .
请注意, 271828 由e的前几个十进制数字组成。
Related later commits:相关的后续提交:
By Mark Dickinson in Apr 2010 ( also ), making the Decimal
type behave similarly 作者:Mark Dickinson 在 2010 年 4 月( 也),使
Decimal
类型的行为类似
By Mark Dickinson in Apr 2010 ( also ), moving this check to the top and adding test cases 作者:Mark Dickinson 于 2010 年 4 月( 也),将此检查移至顶部并添加测试用例
By Mark Dickinson in May 2010 as issue 8188 , completely rewriting the hash function to its current implementation , but retaining this special case, giving the constant a name _PyHASH_INF
(also removing the 271828 which is why in Python 3 hash(float('-inf'))
returns -314159
rather than -271828
as it does in Python 2) 作者 Mark Dickinson 于 2010 年 5 月作为issue 8188将哈希函数完全重写为其当前实现,但保留了这个特殊情况,给常量一个名称
_PyHASH_INF
(也删除了 271828 这就是为什么在 Python 3 hash(float('-inf'))
返回-314159
而不是-271828
因为它在 Python 2)
By Raymond Hettinger in Jan 2011 , adding an explicit example in the "What's new" for Python 3.2 of sys.hash_info
showing the above value. 作者:Raymond Hettinger 于 2011 年 1 月,在
sys.hash_info
Python 3.2 的“新增功能”中添加了一个显式示例,显示了上述值。 (See here .) (见这里。)
By Stefan Krah in Mar 2012 modifying the Decimal module but keeping this hash. 作者:Stefan Krah 在 2012 年 3 月修改了 Decimal 模块,但保留了这个哈希值。
By Christian Heimes in Nov 2013 , moved the definition of _PyHASH_INF
from Include/pyport.h
to Include/pyhash.h
where it now lives. Christian Heimes 于 2013 年 11 月将
_PyHASH_INF
的定义从Include/pyport.h
Include/pyhash.h
了它现在所在的Include/pyhash.h
。
_PyHASH_INF
is defined as a constant equal to 314159
. _PyHASH_INF
定义为等于314159
的常量。
I can't find any discussion about this, or comments giving a reason.我找不到任何关于此的讨论,或给出理由的评论。 I think it was chosen more or less arbitrarily.
我认为它或多或少是任意选择的。 I imagine that as long as they don't use the same meaningful value for other hashes, it shouldn't matter.
我想只要他们不对其他散列使用相同的有意义的值,就没有关系。
Indeed,的确,
sys.hash_info.inf
returns 314159
.返回
314159
。 The value is not generated, it's built into the source code.该值不是生成的,它内置在源代码中。 In fact,
实际上,
hash(float('-inf'))
returns -271828
, or approximately -e, in python 2 ( it's -314159 now ).在 python 2 中返回
-271828
或大约 -e ( 现在是 -314159 )。
The fact that the two most famous irrational numbers of all time are used as the hash values makes it very unlikely to be a coincidence.有史以来最著名的两个无理数被用作哈希值这一事实使得这不太可能是巧合。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.