Python: parallel execution of a function which has a sequential loop inside

Question

I am reproducing some simple 10-arm bandit experiments from Sutton and Barto's book Reinforcement Learning: An Introduction . Some of these require significant computation time so I tried to get the advantage of my multicore CPU.

Here is the function which i need to run 2000 times. It has 1000 sequential steps which incrementally improve the reward:

import numpy as np

def foo(eps): # need an (unused) argument to use pool.map()
    # initialising
    # the true values of the actions
    q = np.random.normal(0, 1, size=10)
    # the estimated values
    q_est = np.zeros(10)
    # the counter of how many times each of the 10 actions was chosen
    n = np.zeros(10)

    rewards = []
    for i in range(1000):
        # choose an action based on its estimated value
        a = np.argmax(q_est)
        # get the normally distributed reward 
        rewards.append(np.random.normal(q[a], 1)) 
        # increment the chosen action counter
        n[a] += 1 
        # update the estimated value of the action
        q_est[a] += (rewards[-1] - q_est[a]) / n[a] 
    return rewards

I execute this function 2000 times to get (2000, 1000) array:

reward = np.array([foo(0) for _ in range(2000)])

Then I plot the mean reward across 2000 experiments:

import matplotlib.pyplot as plt
plt.plot(np.arange(1000), reward.mean(axis=0))

sequential plot

which fully corresponds the expected result (looks the same as in the book). But when I try to execute it in parallel, I get much greater standard deviation of the average reward:

import multiprocessing as mp
with mp.Pool(mp.cpu_count()) as pool:
    reward_p = np.array(pool.map(foo, [0]*2000))
plt.plot(np.arange(1000), reward_p.mean(axis=0))

parallel plot

I suppose this is due to the parallelization of a loop inside of the foo. As i reduce the number of cores allocated to the task, the reward plot approaches the expected shape.

Is there a way to get the advantage of the multiprocessing here while getting the correct results?

UPD: I tried running the same code on Windows 10 and sequential vs parallel and the results turned out to be the same! What may be the reason?

Ubuntu 20.04, Python 3.8.5, jupyter

Windows 10, Python 3.7.3, jupyter

Answer 1

As we found out it is different on windows and ubuntu. It is probably because of this:

spawn The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process objects run() method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using fork or forkserver.

Available on Unix and Windows. The default on Windows and macOS.

fork The parent process uses os.fork() to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic.

Available on Unix only. The default on Unix.

Try adding this line to your code:

mp.set_start_method('spawn')

Python: parallel execution of a function which has a sequential loop inside

Question

1 answers

solution1
1 ACCPTED 2020-11-01 15:44:42

Python: parallel execution of a function which has a sequential loop inside

Question

1 answers

solution1 1 ACCPTED 2020-11-01 15:44:42

solution1
1 ACCPTED 2020-11-01 15:44:42