简体   繁体   中英

TPOT taking too long to train

Ive been trying to use tpot for the first time on a dataset that has approximately 7000 rows, when trying to train tpot on the training dataset which is 25% of the dataset as a whole, tpot takes too long. ive been running the code for approximately 45 minutes on google colab and the optimization progress is still at 4%. Ive just been trying to use the example as seen on: http://epistasislab.github.io/tpot/examples/ . Is it typical for tpot to take this long, because so far i dont think its worth even trying to use it

TPOT can take quite a long time depending on the dataset you have. You have to consider what TPOT is doing: TPOT is evaluating thousands of analysis pipelines and fitting thousands of ML models on your dataset in the background, and if you have a large dataset, then all that fitting can take a long time--especially if you're running it on a less powerful computer.

If you'd like faster results, you have a few options:

  1. Use the "TPOT light" configuration , which uses simpler models and will run faster.

  2. Set the n_jobs parameter to -1 or a number greater than 1 , which will allow TPOT to evaluate pipelines in parallel. -1 will use all of the available cores and speed things up significantly if you have a multicore machine.

  3. Subsample the data using the subsample parameter. The default is 1.0, corresponding to using 100% of your training data. You can subsample to lower percentages of the data and TPOT will run faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM