Why multiple process is slower than single when I'm trying to accelerate my program by multiprocess.Pool in python?

Question

I know many people asked the similar question. But I can't find any thing can explain the phenomenon. Here is my code.

import time
from multiprocessing import Pool
import numpy as np

def _foo(x):
    np.linalg.inv(x)

if __name__ == '__main__':
    t = time.time()
    r = np.random.rand(1000, 1000)
    p = Pool(2)
    p.map(_foo, [r.copy() for i in range(8)])
    print 'Finished in', time.time() - t, 'sec'

When I use other time-consuming operating to test my code rather than np.linalg.inv . It works fine. I do get the performance improvement with increasing size of the Pool . However, when I using np.linalg.inv in function _foo , Pool(2) is extremely slower than Pool(1) . Pool(1) finished in 0.77 and Pool(2) is 9.84. The code is tested on a machine which has 6 physics core.

The only explanation I can infer is the inv method sharing some resources. But I have copied r for every process. It seems no need to do so.

Answer 1

I finally got the point. It is a "bug" of numpy builed with openBLAS on Ubuntu. Since Unbuntu 12.04, openBLAS became multithreading. So when I start two processing to accelerate my computation, there is actually 24 thread running on 6 physical cores. It is a typical overhead problem.

My method to solve it is set environmental variable OPENBLAS_NUM_THREADS=1. This force openBLAS to run in single thread mode.

Why multiple process is slower than single when I'm trying to accelerate my program by multiprocess.Pool in python?

Question

1 answers

solution1
0 ACCPTED 2015-12-16 10:52:43

Why multiple process is slower than single when I'm trying to accelerate my program by multiprocess.Pool in python?

Question

1 answers

solution1 0 ACCPTED 2015-12-16 10:52:43

solution1
0 ACCPTED 2015-12-16 10:52:43