简体   繁体   中英

Python: parallel execution of a function which has a sequential loop inside

I am reproducing some simple 10-arm bandit experiments from Sutton and Barto's book Reinforcement Learning: An Introduction . Some of these require significant computation time so I tried to get the advantage of my multicore CPU.

Here is the function which i need to run 2000 times. It has 1000 sequential steps which incrementally improve the reward:

import numpy as np

def foo(eps): # need an (unused) argument to use pool.map()
    # initialising
    # the true values of the actions
    q = np.random.normal(0, 1, size=10)
    # the estimated values
    q_est = np.zeros(10)
    # the counter of how many times each of the 10 actions was chosen
    n = np.zeros(10)

    rewards = []
    for i in range(1000):
        # choose an action based on its estimated value
        a = np.argmax(q_est)
        # get the normally distributed reward 
        rewards.append(np.random.normal(q[a], 1)) 
        # increment the chosen action counter
        n[a] += 1 
        # update the estimated value of the action
        q_est[a] += (rewards[-1] - q_est[a]) / n[a] 
    return rewards

I execute this function 2000 times to get (2000, 1000) array:

reward = np.array([foo(0) for _ in range(2000)])

Then I plot the mean reward across 2000 experiments:

import matplotlib.pyplot as plt
plt.plot(np.arange(1000), reward.mean(axis=0))

sequential plot

which fully corresponds the expected result (looks the same as in the book). But when I try to execute it in parallel, I get much greater standard deviation of the average reward:

import multiprocessing as mp
with mp.Pool(mp.cpu_count()) as pool:
    reward_p = np.array(pool.map(foo, [0]*2000))
plt.plot(np.arange(1000), reward_p.mean(axis=0))

parallel plot

I suppose this is due to the parallelization of a loop inside of the foo. As i reduce the number of cores allocated to the task, the reward plot approaches the expected shape.

Is there a way to get the advantage of the multiprocessing here while getting the correct results?

UPD: I tried running the same code on Windows 10 and sequential vs parallel and the results turned out to be the same! What may be the reason?

Ubuntu 20.04, Python 3.8.5, jupyter

Windows 10, Python 3.7.3, jupyter

As we found out it is different on windows and ubuntu. It is probably because of this:

spawn The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.

Available on Unix and Windows. The default on Windows and macOS.

fork The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.

Available on Unix only. The default on Unix.

Try adding this line to your code:

mp.set_start_method('spawn')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM