简体   繁体   English

Python 3.8 多处理,用于在 MacOS 上复制随机数

[英]Python 3.8 multiprocessing for copying the random on MacOS

from multiprocessing import Pool, cpu_count
import numpy as np
from numpy.random import multivariate_normal

F = multivariate_normal(np.zeros(3), np.eye(3), (3, 5))

def test(k):
    print(k)
    res = np.zeros((5, 3))
    for i in range(3):
        res[:, i] = F[k, :, i]
        #print(res[:, i])
    return res


if __name__ == '__main__':
    with Pool(cpu_count()) as pool:
        result = pool.map(test, range(3))
    pool.close()
    pool.join()
    result = np.array(results)

In python3.6, the result is equal to the random matrix F. But their two matrices are different in python 3.8.在python3.6中,结果等于随机矩阵F。但是在python 3.8中它们的两个矩阵是不同的。 This is just an example.这只是一个例子。 In the real code, I want to pick up each column of F in each time step and do some operations on it.在实际代码中,我想在每个时间步中取出 F 的每一列并对其进行一些操作。

Psychic debugging: You're running on Windows (any Python version) or macOS (Python 3.8 or later), both of which default to the 'spawn' method of making worker processes, rather than 'fork' .心理调试:您在 Windows(任何 Python 版本)或 macOS(Python 3.8 或更高版本)上运行,两者都默认使用'spawn'制作工作进程的方法,而不是'fork' When that happens, the __main__ module is imported (with a different name, so it doesn't try to run the main code again) in the child process to simulate a fork .发生这种情况时,会在子进程中导入__main__模块(使用不同的名称,因此它不会再次尝试运行主代码)以模拟fork

This mostly works, but it fails badly in the case of self-seeding PRNGs, because they reseed in the child process, and the globals are regenerated from that new PRNG, rather than having their generated values inherited by the child.这主要是可行的,但在自播 PRNG 的情况下会失败,因为它们在子进程中重新播种,并且全局是从新的 PRNG 重新生成的,而不是让子进程继承它们生成的值。

In short, the ways to make this work are:简而言之,完成这项工作的方法是:

  1. Run it with 'fork' as the multiprocessing start method (not possible on Windows, technically allowed on macOS, but could break, it's why they changed the default to 'spawn' ).使用'fork'作为multiprocessing启动方法运行它(在 Windows 上不可能,在 macOS 上技术上允许,但可能会中断,这就是他们将默认设置更改为'spawn'的原因)。 I tested your code on Linux (where it fork s by default), and aside from fixing a typo (you typed results in one place where it should be result ), result and F are the same there.我在 Linux 上测试了你的代码(默认情况下它是fork s),除了修复一个错字(你在一个应该是result的地方输入了results )之外, resultF在那里是一样的。 When I add multiprocessing.set_start_method('spawn') just before creating the Pool , they don't match.当我在创建Pool之前添加multiprocessing.set_start_method('spawn')时,它们不匹配。
  2. Explicitly pass an initializer function and arguments to Pool so each worker resets F to the value seen in the parent将初始化程序 function 和 arguments 显式传递给Pool以便每个工作人员将F重置为在父项中看到的值
  3. Use an explicit seed before generating F so it's consistent no matter the process (downside: it'll be the same every run, or at least, very predictable, depending how clever you try to get)在生成F之前使用显式种子,因此无论过程如何它都是一致的(缺点:每次运行都会相同,或者至少非常可预测,具体取决于您尝试获得的聪明程度)

Note that #2 and #3 can be combined to minimize dataflow.请注意,#2 和#3 可以组合以最小化数据流。 Generate a "real" random seed in the parent (eg with os.urandom ) and write a simple function that accepts a seed and uses it to both seed the PRNG and generate F (using global F to let it change the global value).在父级中生成一个“真正的”随机种子(例如使用os.urandom )并编写一个简单的 function 接受种子并使用它来播种 PRNG 并生成F (使用global F让它改变全局值)。 Call that function in the parent and pass it as the initializer with the seed argument to each child.在父级中调用 function 并将其作为带有种子参数的初始化程序传递给每个子级。 Now, instead of passing the generated value of F (potentially huge), you only need to pass the seed, and the child process can reproduce F locally without needing to serialize the whole thing.现在,不需要传递F的生成值(可能是巨大的),您只需要传递种子,子进程就可以在本地重现F而无需序列化整个事物。 Downside: All processes share the same random seed;缺点:所有进程共享相同的随机种子; it's not predictable like a hardcoded seed, but parents and children will be drawing from an identical set of random numbers.它不像硬编码的种子那样可预测,但父母和孩子将从一组相同的随机数中提取。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM