Unable to use joblib.Parallel with CPU for n number of iterations along with GPU for training models

Question

I am developing a custom cross validation function in Python which runs for given number of iterations where in each iteration following steps are executed:

Dataset is randomly shuffled, split into training & testing sets
Model is compiled and trained using training dataset
Predictions are made on test dataset

At the end average of MSE from all iterations is calculated. For 100 iterations it's taking around 2.5 hours.

In order to speed up this process, I would like to utilize multiple CPU cores (4 physical CPUs 2 threads each) available on my Windows 10 machine with a single GPU attached (Nvidia Quadro 5 GB).

I found that joblib.Parallel can be used for executing tasks in a "for loop" on multiple CPU cores in parallel and used the function as follows:

With parallel_backend("loky", inner_max_num_threads=2):
    Parallel(n_jobs = 2)(delayed(myfunction)(i) for i in range(n_iterations)

Regarding limiting the memory growth on GPU, I added following code:

tf.config.experimental.set_memory_growth(gpu, True)

But even for 2 iterations, I am getting the following error:

InternalError:  Blas GEMV launch failed:  m=3, n=128

Function call stack:
train_function

Please let me know if there's solution/alternative to use joblib.Parallel for parallel processing the iterations via CPU along with GPU for training machine learning models

Thanks in advance!!

Surya

Answer 1

I solved my problem by replacing "joblib.Parallel" with Python's multiprocessing module.

I used Pool of processes and a Manager object for sharing common information as follows:

with Pool(processes=n_jobs) as pool:
     with Manager() as manager:
          shared_list = manager.list()

          results = []
          for i in range(n_iterations):
              result = pool.apply_async(......)
              results.append(result)
                        
          for result in results:
              try:
                 result.get()
              except BaseException as e:
                 print('\nresult get error:', e)
                 continue

And regarding limiting GPU memory growth, we have to executed following code inside each individual Process (the target function called in a child process) separately

tf.config.experimental.set_memory_growth(gpu, True)

Unable to use joblib.Parallel with CPU for n number of iterations along with GPU for training models

Question

1 answers

solution1
0 2022-02-04 10:28:22

Unable to use joblib.Parallel with CPU for n number of iterations along with GPU for training models

Question

1 answers

solution1 0 2022-02-04 10:28:22

solution1
0 2022-02-04 10:28:22