简体   繁体   English

TPOT 训练时间太长

[英]TPOT taking too long to train

Ive been trying to use tpot for the first time on a dataset that has approximately 7000 rows, when trying to train tpot on the training dataset which is 25% of the dataset as a whole, tpot takes too long.我第一次尝试在大约 7000 行的数据集上使用 tpot,当尝试在占整个数据集 25% 的训练数据集上训练 tpot 时,tpot 花费的时间太长。 ive been running the code for approximately 45 minutes on google colab and the optimization progress is still at 4%.我在 google colab 上运行代码大约 45 分钟,优化进度仍为 4%。 Ive just been trying to use the example as seen on: http://epistasislab.github.io/tpot/examples/ .我一直在尝试使用以下示例: http://epistasislab.github.io/tpot/examples/ Is it typical for tpot to take this long, because so far i dont think its worth even trying to use it tpot 需要这么长时间是典型的吗,因为到目前为止我认为它甚至不值得尝试使用它

TPOT can take quite a long time depending on the dataset you have. TPOT 可能需要很长时间,具体取决于您拥有的数据集。 You have to consider what TPOT is doing: TPOT is evaluating thousands of analysis pipelines and fitting thousands of ML models on your dataset in the background, and if you have a large dataset, then all that fitting can take a long time--especially if you're running it on a less powerful computer.您必须考虑 TPOT 正在做什么:TPOT 正在评估数千个分析管道并在后台在您的数据集上拟合数千个 ML 模型,如果您有一个大数据集,那么所有这些拟合可能需要很长时间——尤其是如果您在功能较弱的计算机上运行它。

If you'd like faster results, you have a few options:如果您想要更快的结果,您有几个选择:

  1. Use the "TPOT light" configuration , which uses simpler models and will run faster.使用“TPOT light”配置,使用更简单的模型,运行速度更快。

  2. Set the n_jobs parameter to -1 or a number greater than 1 , which will allow TPOT to evaluate pipelines in parallel.n_jobs参数设置为-1或大于1的数字,这将允许 TPOT 并行评估管道。 -1 will use all of the available cores and speed things up significantly if you have a multicore machine.如果您有一台多核机器, -1将使用所有可用的内核并显着加快速度。

  3. Subsample the data using the subsample parameter.使用subsample参数对数据进行二次采样。 The default is 1.0, corresponding to using 100% of your training data.默认值为 1.0,对应于使用 100% 的训练数据。 You can subsample to lower percentages of the data and TPOT will run faster.您可以对较低百分比的数据进行二次抽样,TPOT 将运行得更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM