简体   繁体   English

使用 VotingClassifier() 构建随机森林模型的集合

[英]Building an ensemble of Random Forest models with VotingClassifier()

I'm trying to build an ensemble of some models using VotingClassifier() from Sklearn to see if it works better than the individual models.我正在尝试使用 Sklearn 中的 VotingClassifier() 构建一些模型的集合,以查看它是否比单个模型更好。 I'm trying it in 2 different ways.我正在尝试两种不同的方式。

  1. I'm trying to do it with individual Random Forest, Gradient Boosting, and XGBoost models.我正在尝试使用单独的随机森林、梯度提升和 XGBoost 模型来做到这一点。
  2. I'm trying to build it using an ensemble of many Random Forest models (using different parameters for n_estimators and max_depth.我正在尝试使用许多随机森林模型的集合来构建它(对 n_estimators 和 max_depth 使用不同的参数。

In the first condition, I'm doing this在第一个条件下,我正在这样做

estimator = []
estimator.append(('RF', RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=8, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=900,
                       n_jobs=-1, oob_score=True, random_state=66, verbose=0,
                       warm_start=True)))
estimator.append(('GB', GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,
                           learning_rate=0.03, loss='deviance', max_depth=5,
                           max_features=None, max_leaf_nodes=None,
                           min_impurity_decrease=0.0, min_impurity_split=None,
                           min_samples_leaf=1, min_samples_split=2,
                           min_weight_fraction_leaf=0.0, n_estimators=1000,
                           n_iter_no_change=None, presort='deprecated',
                           random_state=66, subsample=1.0, tol=0.0001,
                           validation_fraction=0.1, verbose=0,
                           warm_start=False)))
estimator.append(('XGB', xgb.XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=9,
              min_child_weight=1, n_estimators=1000, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)))

And when I do当我这样做时

ensemble_model_churn = VotingClassifier(estimators = estimator, voting ='soft')

and display ensemble_model_churn, I get everything in the output.并显示 ensemble_model_churn,我得到了 output 中的所有内容。

But under the second condition, I'm doing this但是在第二种情况下,我正在这样做

estimator = []
estimator.append(('RF_1',RandomForestClassifier(n_estimators=500,max_depth=5,warm_start=True)))
estimator.append(('RF_2',RandomForestClassifier(n_estimators=500,max_depth=6,warm_start=True)))
estimator.append(('RF_3',RandomForestClassifier(n_estimators=500,max_depth=7,warm_start=True)))
estimator.append(('RF_4',RandomForestClassifier(n_estimators=500,max_depth=8,warm_start=True)))

And so on.等等。 I have 30 different models like that.我有 30 种不同的型号。

But this time, when I do但是这一次,当我这样做的时候

ensemble_model_churn = VotingClassifier(estimators = estimator, voting ='soft')

and display it, I get only the first one, and not the other ones.并显示它,我只得到第一个,而不是其他的。

print(ensemble_model_churn)
>>>VotingClassifier(estimators=[('RF_1',
                              RandomForestClassifier(bootstrap=True,
                                                     ccp_alpha=0.0,
                                                     class_weight=None,
                                                     criterion='gini',
                                                     max_depth=5,
                                                     max_features='auto',
                                                     max_leaf_nodes=None,
                                                     max_samples=None,
                                                     min_impurity_decrease=0.0,
                                                     min_impurity_split=None,
                                                     min_samples_leaf=1,
                                                     min_samples_split=2,
                                                     min_weight_fraction_leaf=0.0,
                                                     n_estimators=500,
                                                     n_jobs=None,
                                                     oob_score=...
                                                     criterion='gini',
                                                     max_depth=5,
                                                     max_features='auto',
                                                     max_leaf_nodes=None,
                                                     max_samples=None,
                                                     min_impurity_decrease=0.0,
                                                     min_impurity_split=None,
                                                     min_samples_leaf=1,
                                                     min_samples_split=2,
                                                     min_weight_fraction_leaf=0.0,
                                                     n_estimators=500,
                                                     n_jobs=None,
                                                     oob_score=False,
                                                     random_state=None,
                                                     verbose=0,
                                                     warm_start=True))],
                 flatten_transform=True, n_jobs=None, voting='soft',
                 weights=None)

Why is this happening?为什么会这样? Is it not possible to run an ensemble of the same model?难道不能运行相同的 model 的合奏吗?

You are seeing more than one of the estimators, it's just a little hard to tell.您看到的估算器不止一个,只是有点难以判断。 Notice the ellipses ( ... ) after the first oob_score parameter, and that after those some of the hyperparameters are repeated.注意第一个oob_score参数之后的省略号 ( ... ),并且在这些之后重复了一些超参数。 Python just doesn't want to print such a giant wall of text, and has trimmed out most of the middle. Python 就是不想打印这么大的文字墙,把中间的大部分都剪掉了。 You can check that len(ensemble_model_churn.estimators) > 1 .您可以检查len(ensemble_model_churn.estimators) > 1

Another note: sklearn is very against doing any validation at model initiation, preferring to do such checking at fit time.另一个注意事项:sklearn 非常反对在 model 启动时进行任何验证,宁愿在适合时进行此类检查。 (This is because of the way they clone estimators in grid searches and such.) So it's very unlikely that anything will be changed from your explicit input until you call fit . (这是因为他们在网格搜索等中克隆估算器的方式。)因此,在您调用fit之前,您的显式输入不太可能发生任何更改。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM