There are no n_jobs
parameter for GaussianMixture . Meanwhile, whenever I fit the model
from sklearn.mixture import GaussianMixture as GMM
gmm = GMM(n_components=4,
init_params='random',
covariance_type='full',
tol=1e-2,
max_iter=100,
n_init=1)
gmm.fit(X, y)
it spans 16 processes and uses full CPU power of my 16 CPUs machine. I do not want for it to be doing that.
In comparison, Kmeans has n_jobs
parameter that controls mutliprocessing when having multiple initializations ( n_init
> 1). Here multiprocessing comes out of the blue.
My question is where its coming from and how to control it?
You are observing parallel-processing in terms of basic algebraic operations, speed up by BLAS / LAPACK .
Modifying this is not as simple as setting a n_jobs
parameter and depends on your implementation in use!
Common candidates are ATLAS, OpenBLAS and Intel's MKL.
I recommend checking which one is used first, then act accordingly:
import numpy as np
np.__config__.show()
Sadly these things can get tricky . A valid environment for MKL for example can look like this (source) :
export MKL_NUM_THREADS="2"
export MKL_DOMAIN_NUM_THREADS="MKL_BLAS=2"
export OMP_NUM_THREADS="1"
export MKL_DYNAMIC="FALSE"
export OMP_DYNAMIC="FALSE"
For ATLAS, it seems, you define this at compile-time .
And according to this answer , the same applies to OpenBLAS.
As OP tested, it seems you can get away with setting environment-variables for OpenMP , effecting in modification of behaviour even for the open-source candidates Atlas and OpenBLAS (where a compile-time limit is the alternative):
export OMP_NUM_THREADS="4";
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.