I know many people asked the similar question. But I can't find any thing can explain the phenomenon. Here is my code.
import time
from multiprocessing import Pool
import numpy as np
def _foo(x):
np.linalg.inv(x)
if __name__ == '__main__':
t = time.time()
r = np.random.rand(1000, 1000)
p = Pool(2)
p.map(_foo, [r.copy() for i in range(8)])
print 'Finished in', time.time() - t, 'sec'
When I use other time-consuming operating to test my code rather than np.linalg.inv
. It works fine. I do get the performance improvement with increasing size of the Pool
. However, when I using np.linalg.inv
in function _foo
, Pool(2)
is extremely slower than Pool(1)
. Pool(1)
finished in 0.77 and Pool(2)
is 9.84. The code is tested on a machine which has 6 physics core.
The only explanation I can infer is the inv
method sharing some resources. But I have copied r
for every process. It seems no need to do so.
I finally got the point. It is a "bug" of numpy builed with openBLAS on Ubuntu. Since Unbuntu 12.04, openBLAS became multithreading. So when I start two processing to accelerate my computation, there is actually 24 thread running on 6 physical cores. It is a typical overhead problem.
My method to solve it is set environmental variable OPENBLAS_NUM_THREADS=1. This force openBLAS to run in single thread mode.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.