简体   繁体   中英

Unable to use joblib.Parallel with CPU for n number of iterations along with GPU for training models

I am developing a custom cross validation function in Python which runs for given number of iterations where in each iteration following steps are executed:

  1. Dataset is randomly shuffled, split into training & testing sets
  2. Model is compiled and trained using training dataset
  3. Predictions are made on test dataset

At the end average of MSE from all iterations is calculated. For 100 iterations it's taking around 2.5 hours.

In order to speed up this process, I would like to utilize multiple CPU cores (4 physical CPUs 2 threads each) available on my Windows 10 machine with a single GPU attached (Nvidia Quadro 5 GB).

I found that joblib.Parallel can be used for executing tasks in a "for loop" on multiple CPU cores in parallel and used the function as follows:

With parallel_backend("loky", inner_max_num_threads=2):
    Parallel(n_jobs = 2)(delayed(myfunction)(i) for i in range(n_iterations)

Regarding limiting the memory growth on GPU, I added following code:

tf.config.experimental.set_memory_growth(gpu, True)

But even for 2 iterations, I am getting the following error:

InternalError:  Blas GEMV launch failed:  m=3, n=128

Function call stack:
train_function

Please let me know if there's solution/alternative to use joblib.Parallel for parallel processing the iterations via CPU along with GPU for training machine learning models

Thanks in advance!!

Surya

I solved my problem by replacing "joblib.Parallel" with Python's multiprocessing module.

I used Pool of processes and a Manager object for sharing common information as follows:

with Pool(processes=n_jobs) as pool:
     with Manager() as manager:
          shared_list = manager.list()

          results = []
          for i in range(n_iterations):
              result = pool.apply_async(......)
              results.append(result)
                        
          for result in results:
              try:
                 result.get()
              except BaseException as e:
                 print('\nresult get error:', e)
                 continue

And regarding limiting GPU memory growth, we have to executed following code inside each individual Process (the target function called in a child process) separately

tf.config.experimental.set_memory_growth(gpu, True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM