简体   繁体   中英

Parallelism in XGBoost machine learning technique

Its have to do with

parallelism implementation of XGBoost

I am trying to optimize XGBoost execution by giving it parameter nthread = 16 where my system has 24 cores. But when I train my model, it doesn't seem to even cross approx 20% of CPU utilization at any point in time while model training. Code snippet is as follows:-

param_30 <- list("objective" = "reg:linear",    # linear 
               "subsample"= subsample_30,
               "colsample_bytree" = colsample_bytree_30,
               "max_depth" = max_depth_30,    # maximum depth of tree 
               "min_child_weight" = min_child_weight_30,
               "max_delta_step" = max_delta_step_30,
               "eta" = eta_30,    # step size shrinkage 
               "gamma" = gamma_30,  # minimum loss reduction 
               "nthread" = nthreads_30,   # number of threads to be used
               "scale_pos_weight" = 1.0
)
model <- xgboost(data = training.matrix[,-5], 
              label = training.matrix[,5], 
              verbose = 1, nrounds=nrounds_30, params = param_30, 
              maximize = FALSE, early_stopping_rounds = searchGrid$early_stopping_rounds_30[x])

Please explain me ( if possible ) on how I can increase CPU utilization and speed up the model training for efficient execution. Code in R shall be helpful for further understanding.

Assumption:- It is about the execution in R package of XGBoost

This is a guess... but I have had this happen to me...

You are spending to much time communicating during the parallelism and are not ever getting CPU bound. https://en.wikipedia.org/wiki/CPU-bound

Bottom line is your data isn't large enough (rows and columns ), and/or your trees aren't deep enough max_depth to warrant that many cores. Too much overhead. xgboost parallelizes split evaluations so deep trees on big data can keep the CPU humming at max.

I have trained many models where single threaded outperforms 8/16 cores. Too much time switching and not enough work.

**MORE DATA, DEEPER TREES OR LESS CORES:) **

I tried to answer this question but my post was deleted by a moderator. Please see https://stackoverflow.com/a/67188355/5452057 which I believe could help you also, it relates to missing MPI support in the xgboost R-package for Windows available from CRAN.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM