[英]Python hashes differ between threads
I'm using Python 3.6 on Windows and have a parallelizable task that includes computing string hashes. 我在Windows上使用Python 3.6,并且有一个可并行化的任务,其中包括计算字符串哈希。 This is basically a minimal version of my problem: 这基本上是我的问题的最低版本:
#!/usr/bin/env python3
from joblib import Parallel, delayed
def hash_some(foo):
return hash(foo)
def main():
hashes = Parallel(n_jobs=10)(delayed(hash_some)(s) for s in ['a', 'a', 'a'])
print(hashes)
if __name__ == '__main__':
main()
Now, for some reason this prints, eg, the following: 现在,由于某种原因,将打印以下内容:
[3220780809080710068, -561460911962106608, -1551910331007446174]
Where they clearly should all be the same. 他们显然应该在哪里都一样。
The hashes don't always differ, and especially for a lower n_job
value they often turn out the same but this is not guaranteed. 哈希值并不总是相同的,特别是对于较低的n_job
值,它们通常相同,但这并不能保证。
I know hash()
uses a random seed per program invocation but why does it apparently use a different seed per thread? 我知道hash()
对每个程序调用使用随机种子,但是为什么它显然对每个线程使用不同的种子? Is there any way I can set a fixed (but random) seed for all my threads? 有什么方法可以为所有线程设置固定(但随机)的种子吗? (I know about PYTHONHASHSEED=0
but I'd prefer to find an in-code solution) (我知道PYTHONHASHSEED=0
但我更喜欢找到代码内的解决方案)
As you've already explained, randomization of hash can be controlled with PYTHONHASHSEED, for more information take a read on this . 正如您已经解释的那样,可以使用PYTHONHASHSEED控制哈希的随机化,有关更多信息,请阅读this 。 Now, if you want to control the behaviour by code and not with the python interpreter options or export that env. 现在,如果您想通过代码而不是使用python解释器选项来控制行为或导出该env。 var a possible solution could be something like this: var可能的解决方案可能是这样的:
#!/usr/bin/env python3
import random
import os
from joblib import Parallel, delayed
os.environ['PYTHONHASHSEED'] = '0'
def hash_some(foo):
return hash(foo)
def main():
hashes = Parallel(n_jobs=10)(delayed(hash_some)(s) for s in 'a' * 10000)
print(set(hashes))
if __name__ == '__main__':
main()
If you comment the os.environ
line you'll see the final set length won't be 1 anymore 如果您对os.environ
行发表评论,您将看到最终设置的长度不再是1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.