有没有办法在 Python 中并行化集成学习循环？

Question

I want to train multiple LightGBM models simultaneously.我想同时训练多个 LightGBM 模型。

Right now, I'm training them sequentially like below:现在，我正在按如下顺序训练它们：

for m in range(ensemble_n):
   params = {'seed':m}
   model = lgb.train(params, lgbtrain)
   prediction=model.predict(test_df.drop([target], axis=1))
   test_predictions[:, m] = prediction

Is there a way for me to parallelize the loop above?有没有办法让我并行化上面的循环？

Answer 1

Training multiple versions of a model in parallel comes at a cost that you need to have multiple versions of the data loaded into memory, which can get difficult if you have a sizeable dataset.并行训练 model 的多个版本的代价是您需要将多个版本的数据加载到 memory 中，如果您有一个相当大的数据集，这可能会变得很困难。

At the same time, if you're using a scikit-learn API of LGBM, you can utilise the parameter n_jobs=-1 which will parallelize calculations of a single model over all available cores.同时，如果您使用的是 LGBM 的 scikit-learn API，您可以使用参数n_jobs=-1 ，它将在所有可用内核上并行计算单个 model。 This would be a more efficient use of resources, because regardless you'll have to chose either to train multiple models in parallel or to train a single model in parallel, but not both.这将更有效地利用资源，因为无论您必须选择并行训练多个模型还是并行训练单个 model，但不能同时训练两者。

有没有办法在 Python 中并行化集成学习循环？

问题描述

1 个解决方案

解决方案1
0 2021-04-22 05:01:41

有没有办法在 Python 中并行化集成学习循环？

问题描述

1 个解决方案

解决方案1 0 2021-04-22 05:01:41

解决方案1
0 2021-04-22 05:01:41