简体   繁体   English

线程之间的Python哈希不同

[英]Python hashes differ between threads

I'm using Python 3.6 on Windows and have a parallelizable task that includes computing string hashes. 我在Windows上使用Python 3.6,并且有一个可并行化的任务,其中包括计算字符串哈希。 This is basically a minimal version of my problem: 这基本上是我的问题的最低版本:

#!/usr/bin/env python3
from joblib import Parallel, delayed


def hash_some(foo):
    return hash(foo)


def main():
    hashes = Parallel(n_jobs=10)(delayed(hash_some)(s) for s in ['a', 'a', 'a'])

    print(hashes)


if __name__ == '__main__':
    main()

Now, for some reason this prints, eg, the following: 现在,由于某种原因,将打印以下内容:

[3220780809080710068, -561460911962106608, -1551910331007446174]

Where they clearly should all be the same. 他们显然应该在哪里都一样。

The hashes don't always differ, and especially for a lower n_job value they often turn out the same but this is not guaranteed. 哈希值并不总是相同的,特别是对于较低的n_job值,它们通常相同,但这并不能保证。

I know hash() uses a random seed per program invocation but why does it apparently use a different seed per thread? 我知道hash()对每个程序调用使用随机种子,但是为什么它显然对每个线程使用不同的种子? Is there any way I can set a fixed (but random) seed for all my threads? 有什么方法可以为所有线程设置固定(但随机)的种子吗? (I know about PYTHONHASHSEED=0 but I'd prefer to find an in-code solution) (我知道PYTHONHASHSEED=0但我更喜欢找到代码内的解决方案)

As you've already explained, randomization of hash can be controlled with PYTHONHASHSEED, for more information take a read on this . 正如您已经解释的那样,可以使用PYTHONHASHSEED控制哈希的随机化,有关更多信息,请阅读this Now, if you want to control the behaviour by code and not with the python interpreter options or export that env. 现在,如果您想通过代码而不是使用python解释器选项来控制行为或导出该env。 var a possible solution could be something like this: var可能的解决方案可能是这样的:

#!/usr/bin/env python3
import random
import os
from joblib import Parallel, delayed

os.environ['PYTHONHASHSEED'] = '0'

def hash_some(foo):
    return hash(foo)

def main():
    hashes = Parallel(n_jobs=10)(delayed(hash_some)(s) for s in 'a' * 10000)

    print(set(hashes))

if __name__ == '__main__':
    main()

If you comment the os.environ line you'll see the final set length won't be 1 anymore 如果您对os.environ行发表评论,您将看到最终设置的长度不再是1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM