简体   繁体   English

Sklearn:如何为两个二元分类器制作集成?

[英]Sklearn: How to make an ensemble for two binary classifiers?

I have two classifiers for a multimedia dataset.我有两个用于多媒体数据集的分类器。 One for visual material and one for textual material.一种用于视觉材料,一种用于文本材料。 I want to combine the predictions of these classifiers to make a final prediction.我想结合这些分类器的预测来做出最终的预测。 I have been reading about bagging, boosting and stacking ensembles and all seem useful and I would like to try them.我一直在阅读关于 bagging、boosting 和 stacking ensemble 的文章,所有这些看起来都很有用,我想尝试一下。 However, I can only seem to find rather theoretical examples for my specific problem, nothing concrete enough for me to understand how to actually implement it (in python with scikit-learn).然而,我似乎只能为我的特定问题找到相当理论化的例子,没有什么具体到让我理解如何实际实现它(在 python 中使用 scikit-learn)。 My two classifiers both use 10 KFold CV with SVM classification.我的两个分类器都使用 10 KFold CV 和 SVM 分类。 Both outputting a list of n_samples = 1000 with predictions (either 1's or 0's).两者都输出带有预测(1 或 0)的n_samples = 1000列表。 Also, I made them both produce a list of probabilities on which the predictions are based, looking like this:此外,我让他们都生成了预测所基于的概率列表,如下所示:

 [[ 0.96761819  0.03238181]
 [ 0.96761819  0.03238181]
  ....
 [ 0.96761819  0.03238181]
 [ 0.96761819  0.03238181]]

How would I go about combining these in an ensemble.我将如何将这些组合成一个整体。 What should I use as input?我应该使用什么作为输入? Ive tried concatenating the label predictions horizontally and input them as features, but with no luck (same for the probabilities).我试过水平连接标签预测并将它们作为特征输入,但没有运气(概率相同)。

If you're looking for combining strictly, I recomend using brew because it is built on top of sklearn (meaning that you can use your sklearn classifiers), and, last time I checked, sklearn was good for creating ensembles (Bagging, AdaBoost, RandomForest ...), but not many combining rules were provided for your own custom ensemble (such as hybrid ensembles).如果您正在寻找严格的组合,我建议使用brew,因为它建立在 sklearn 之上(这意味着您可以使用您的 sklearn 分类器),并且,上次我检查时,sklearn 非常适合创建集成(Bagging、AdaBoost、 RandomForest ...),但为您自己的自定义集成(例如混合集成)提供的组合规则并不多。

https://github.com/viisar/brew https://github.com/viisar/brew

from brew.base import Ensemble
from brew.base import EnsembleClassifier
from brew.combination.combiner import Combiner

# create your Ensemble
clfs = your_list_of_classifiers # [clf1, clf2]
ens = Ensemble(classifiers = clfs)

# create your Combiner
# the rules can be 'majority_vote', 'max', 'min', 'mean' or 'median'
comb = Combiner(rule='mean')

# now create your ensemble classifier
ensemble_clf = EnsembleClassifier(ensemble=ens, combiner=comb)
ensemble_clf.predict(X)

It depends entirely on the ensemble method you want to implement.这完全取决于您要实现的集成方法。 Have you taken a look at the sklearn-ensemble documentation?您是否查看过sklearn-ensemble文档?

http://scikit-learn.org/stable/modules/classes.html#module-sklearn.ensemble http://scikit-learn.org/stable/modules/classes.html#module-sklearn.ensemble

There is a classifier called 'VotingClassifier' in sklearn.ensemble which can be used to club multiple classifiers and the predicted labels will be based on voting from the enlisted classifiers. sklearn.ensemble 中有一个名为“VotingClassifier”的分类器,可用于对多个分类器进行分组,预测标签将基于来自登记分类器的投票。 Here is the example:这是示例:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM