简体   繁体   中英

caret: Choosing the correct number of cores in parallel backend

I am trying to use caret to cross-validate an elastic net model using the glmnet implementation on an Ubuntu machine with 8 CPU cores & 32 GB of RAM. When I train sequentially, I am maxing out CPU usage on one core, but using 50% of the memory on average.

  • When I use doMC(cores = xxx) , do I need to worry about only registering xxx = floor(100/y) cores, where y is the memory usage of the model when using a single core (in %), in order to not run out of memory?

  • Does caret have any heuristics that allow it to figure out the max. number of cores to use?

  • Is there any set of heuristics that I can use to dynamically adjust the number of cores to use my computing resources optimally across different sizes of data and model complexities?


Edit:

FWIW, attempting to use 8 cores made my machine unresponsive. Clearly caret does not check to see if the spawning xxx processes is likely to be problematic. How can I then choose the number of cores dynamically?

Clearly caret does not check to see if the spawning xxx processes is likely to be problematic.

True; it cannot predict future performance of your computer.

You should get an understanding of how much memory you use for modeling use when running sequentially. You can start the training and use top or other methods to estimate the amount of ram used then kill the process. If sequentially you use X GB of RAM sequentially, running on M cores will require X(M+1) GB of ram.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM