简体   繁体   English

有没有办法在 Python 中并行化集成学习循环?

[英]Is there a way to parallelize a loop for ensemble learning in Python?

I want to train multiple LightGBM models simultaneously.我想同时训练多个 LightGBM 模型。

Right now, I'm training them sequentially like below:现在,我正在按如下顺序训练它们:

for m in range(ensemble_n):
   params = {'seed':m}
   model = lgb.train(params, lgbtrain)
   prediction=model.predict(test_df.drop([target], axis=1))
   test_predictions[:, m] = prediction

Is there a way for me to parallelize the loop above?有没有办法让我并行化上面的循环?

Training multiple versions of a model in parallel comes at a cost that you need to have multiple versions of the data loaded into memory, which can get difficult if you have a sizeable dataset.并行训练 model 的多个版本的代价是您需要将多个版本的数据加载到 memory 中,如果您有一个相当大的数据集,这可能会变得很困难。

At the same time, if you're using a scikit-learn API of LGBM, you can utilise the parameter n_jobs=-1 which will parallelize calculations of a single model over all available cores.同时,如果您使用的是 LGBM 的 scikit-learn API,您可以使用参数n_jobs=-1 ,它将在所有可用内核上并行计算单个 model。 This would be a more efficient use of resources, because regardless you'll have to chose either to train multiple models in parallel or to train a single model in parallel, but not both.这将更有效地利用资源,因为无论您必须选择并行训练多个模型还是并行训练单个 model,但不能同时训练两者。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM