简体   繁体   中英

Python 3.8 multiprocessing for copying the random on MacOS

from multiprocessing import Pool, cpu_count
import numpy as np
from numpy.random import multivariate_normal

F = multivariate_normal(np.zeros(3), np.eye(3), (3, 5))

def test(k):
    print(k)
    res = np.zeros((5, 3))
    for i in range(3):
        res[:, i] = F[k, :, i]
        #print(res[:, i])
    return res


if __name__ == '__main__':
    with Pool(cpu_count()) as pool:
        result = pool.map(test, range(3))
    pool.close()
    pool.join()
    result = np.array(results)

In python3.6, the result is equal to the random matrix F. But their two matrices are different in python 3.8. This is just an example. In the real code, I want to pick up each column of F in each time step and do some operations on it.

Psychic debugging: You're running on Windows (any Python version) or macOS (Python 3.8 or later), both of which default to the 'spawn' method of making worker processes, rather than 'fork' . When that happens, the __main__ module is imported (with a different name, so it doesn't try to run the main code again) in the child process to simulate a fork .

This mostly works, but it fails badly in the case of self-seeding PRNGs, because they reseed in the child process, and the globals are regenerated from that new PRNG, rather than having their generated values inherited by the child.

In short, the ways to make this work are:

  1. Run it with 'fork' as the multiprocessing start method (not possible on Windows, technically allowed on macOS, but could break, it's why they changed the default to 'spawn' ). I tested your code on Linux (where it fork s by default), and aside from fixing a typo (you typed results in one place where it should be result ), result and F are the same there. When I add multiprocessing.set_start_method('spawn') just before creating the Pool , they don't match.
  2. Explicitly pass an initializer function and arguments to Pool so each worker resets F to the value seen in the parent
  3. Use an explicit seed before generating F so it's consistent no matter the process (downside: it'll be the same every run, or at least, very predictable, depending how clever you try to get)

Note that #2 and #3 can be combined to minimize dataflow. Generate a "real" random seed in the parent (eg with os.urandom ) and write a simple function that accepts a seed and uses it to both seed the PRNG and generate F (using global F to let it change the global value). Call that function in the parent and pass it as the initializer with the seed argument to each child. Now, instead of passing the generated value of F (potentially huge), you only need to pass the seed, and the child process can reproduce F locally without needing to serialize the whole thing. Downside: All processes share the same random seed; it's not predictable like a hardcoded seed, but parents and children will be drawing from an identical set of random numbers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM