简体   繁体   English

Python2.6的内置哈希方法是否跨架构稳定?

[英]Is the builtin hash method of Python2.6 stable across architectures?

I need to compute a hash that needs to be stable across architectures. 我需要计算一个需要在架构之间保持稳定的哈希。 Is python's hash() stable? python的hash()是否稳定?

To be more specific, the example below shows hash() computing the same value on two different hosts/architectures: 更具体地说,下面的示例显示了hash()在两个不同的主机/体系结构上计算相同的值:

# on OSX based laptop
>>> hash((1,2,3,4))
485696759010151909
# on x86_64 Linux host
>>> hash((1,2,3,4))
485696759010151909

The above is true for at least those inputs, but my question is for the general case 以上情况至少适用于那些输入,但我的问题是针对一般情况

如果你需要一个定义良好的哈希,你可以使用一个hashlib

The hash() function is not what you want; hash()函数不是你想要的; finding a reliable way to serialize the object (eg str() or repr() ) and running it through hashlib.md5() would probably be much more preferrable. 找到一种可靠的方法来序列化对象(例如str()repr() )并通过hashlib.md5()运行它可能会更加优先。

In detail - hash() is designed to return an integer which uniquely identifies an object only within it's lifetime . 详细说明 - hash()旨在返回一个整数,该整数仅在其生命周期内唯一标识对象。 Once the program is run again, constructing a new object may in fact have a different hash. 一旦程序再次运行,构造新对象实际上可能具有不同的散列。 Destroying an object means there's a chance another object will have that hash in the future. 销毁对象意味着将来有另一个对象将拥有该哈希。 See python's definition of hashable for more. 有关更多信息,请参阅python的hashable定义。

Behind the scenes, most user-defined python objects fall back to id() to provide their hash value. 在幕后,大多数用户定义的python对象回退到id()以提供其哈希值。 While you're not supposed to make use of this, id(obj) and thus hash(obj) is usually implemented (eg in CPython) as the memory address of the underlying Python object. 虽然你不应该使用它,但是id(obj)hash(obj)通常被实现(例如在CPython中)作为底层Python对象的内存地址。 Thus you can see why it can't be relied on for anything. 因此,你可以看出为什么它不能依赖于任何东西。

The behavior you currently see is only reliable for certain builtin python objects, and that not very far. 您当前看到的行为仅对某些内置python对象可靠,并且不是很远。 hash({}) for instance is not possible. hash({})是不可能的。


Regarding hashlib.md5(str(obj)) or equivalent - you'll need to make sure str(obj) is reliably the same. 关于hashlib.md5(str(obj))或等价物 - 你需要确保str(obj)可靠地相同。 In particular, if you have a dictionary being rendering within that string, it may not list it's keys in the same order. 特别是,如果您在该字符串中有字典呈现,它可能不会以相同的顺序列出它的键。 There may also be subtle differences between python versions... I would definitely recommend unittests for any implementation you rely on. python版本之间可能还有细微的差别......我肯定会建议你依赖的任何实现的单元测试。

No. 没有。

x86_64
>>> print hash("a")
12416037344

i386
>>> print hash("a")
-468864544

If you need a stable hash, create a digest of your data using something like sha1, which can be found in hashlib 如果您需要稳定的哈希,请使用sha1创建数据摘要,这可以在hashlib中找到

在ARM上使用python 2.6:

>>> hash((1,2,3,4)) 

89902565

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM