Anaconda MKL can't set number of threads

Question

I was using numpy from anaconda to benchmark a big matrix multiplication ( 8192x8192 of type float32 ) like this: (in jupyter)

import numpy as np
a = np.empty((8192, 8192), 'f')
%timeit a @ a

The numpy is build against MKL . When doing the multiplication (continuously), I find the CPU usage of python is always 50%. I'm wondering why it isn't 100% (since matrix multiplication should be automatically palatalized). I therefore googled around and find two ways to set the number of threads MKL uses.

One way is directly using the DLL:

from ctypes import CDLL
mkl = CDLL('../conda/pkgs/mkl-2019.0-118/Library/bin/mkl_rt.dll')
print(mkl.MKL_Set_Num_Threads(4))
print(mkl.MKL_Get_Max_Threads())

which I believe gave me some unknown error code and failed to set:

-899695632
2

Another way is through mkl-service package:

import mkl
print(mkl.set_num_threads(4))
print(mkl.get_max_threads())

which also didn't success.

None
2

I'm wondering why is setting 4 threads in MKL keep failing and how to make it work. I'm under Win7 , 64bit . My CPU is i5-2520M which should have 4 core. My anaconda environment looks like: (abbreviated)

mkl                       2019.0                      118
mkl-service               1.1.2            py36hb217b18_5
mkl_fft                   1.0.6            py36hdbbee80_0
mkl_random                1.0.1            py36h77b88f5_1
numpy                     1.15.3           py36ha559c80_0
numpy-base                1.15.3           py36h8128ebf_0
zeromq                    4.2.5                he025d50_1

Answer 1

Please consider this documentation: https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-intel-mkl-100-threading

The key variable is MKL_NUM_THREADS , which you can set as a global Windows variable.

I strongly disagree with @roro on this. The reason, why you are seeing the 50% is that you are not using your hyperthreading capabilities. Having said that, bear in mind, that there are 2 limiting factors to speed of calculation: CPU power and!! memory access bandwidth. Oftentimes the second will limit the speed to say 70% of your CPU power, cause RAM/cache cannot deliver data fast enough to the algorithm.

Getting parallelism right is among the more challenging parts of HPC.

Anaconda MKL can't set number of threads

Question

1 answers

solution1
-1 2018-11-05 16:43:45

Anaconda MKL can't set number of threads

Question

1 answers

solution1 -1 2018-11-05 16:43:45

solution1
-1 2018-11-05 16:43:45