简体   繁体   English

是否可以从整体投票分类器中拟合出一个特定的估计器?

[英]Is it possible to fit one specific estimator out of an ensemble votingclassifier?

This is my first Question here, please let me know if I am doing something wrong!这是我在这里的第一个问题,如果我做错了什么,请告诉我!

So I used sklearn to build an ensemble votingclassifier that contains 3 different estimators.所以我使用 sklearn 构建了一个包含 3 个不同估计器的集成投票分类器。 I first fit all 3 with the same data by calling: est.fit()我首先通过调用将所有 3 个数据与相同的数据相匹配: est.fit()
This first dataset is small because 2 out of the 3 estimators fitting is very time-consuming.第一个数据集很小,因为 3 个估计器中有 2 个拟合非常耗时。

Now I want to fit the third estimator again with different data.现在我想用不同的数据再次拟合第三个估计器。 Is there a way to achieve this?有没有办法做到这一点?

I tryed accessing the estimator like this: ens.estimators_[2].fit(X_largedata, y_largedata)我尝试像这样访问估计器: ens.estimators_[2].fit(X_largedata, y_largedata)
This does not throw an error but i am not sure if this is fitting a copy of the estimator or the one thats actually part of the ensemble.这不会引发错误,但我不确定这是否适合估计器的副本或实际上是整体一部分的那个。
Calling ens.predict(X_test) after now results in the following error: (predict works fine if i dont try to fit the 3rd estimator)现在调用ens.predict(X_test)会导致以下错误:(如果我不尝试拟合第三个估计器,预测工作正常)

ValueError                                Traceback (most recent call last)
<ipython-input-438-65c955f40b01> in <module>
----> 1 pred_ens2 = ens.predict(X_test_ens2)
      2 print(ens.score(X_test_ens2, y_test_ens2))
      3 confusion_matrix(pred_ens2, y_test_ens2).ravel()

~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in predict(self, X)
    280         check_is_fitted(self)
    281         if self.voting == 'soft':
--> 282             maj = np.argmax(self.predict_proba(X), axis=1)
    283 
    284         else:  # 'hard' voting

~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in _predict_proba(self, X)
    300         """Predict class probabilities for X in 'soft' voting."""
    301         check_is_fitted(self)
--> 302         avg = np.average(self._collect_probas(X), axis=0,
    303                          weights=self._weights_not_none)
    304         return avg

~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in _collect_probas(self, X)
    295     def _collect_probas(self, X):
    296         """Collect results from clf.predict calls."""
--> 297         return np.asarray([clf.predict_proba(X) for clf in self.estimators_])
    298 
    299     def _predict_proba(self, X):

~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in <listcomp>(.0)
    295     def _collect_probas(self, X):
    296         """Collect results from clf.predict calls."""
--> 297         return np.asarray([clf.predict_proba(X) for clf in self.estimators_])
    298 
    299     def _predict_proba(self, X):

~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/utils/metaestimators.py in <lambda>(*args, **kwargs)
    117 
    118         # lambda, but not partial, allows help() to work with update_wrapper
--> 119         out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
    120         # update the docstring of the returned function
    121         update_wrapper(out, self.fn)

~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/pipeline.py in predict_proba(self, X)
    461         Xt = X
    462         for _, name, transform in self._iter(with_final=False):
--> 463             Xt = transform.transform(Xt)
    464         return self.steps[-1][-1].predict_proba(Xt)
    465 

~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py in transform(self, X)
    596             if (n_cols_transform >= n_cols_fit and
    597                     any(X.columns[:n_cols_fit] != self._df_columns)):
--> 598                 raise ValueError('Column ordering must be equal for fit '
    599                                  'and for transform when using the '
    600                                  'remainder keyword')

ValueError: Column ordering must be equal for fit and for transform when using the remainder keyword


EDIT: I fixed the error.编辑:我修复了错误。 It was caused by the small dataset having more columns than the big one, This probably is a problem?这是由于小数据集的列多于大数据集引起的,这可能是一个问题? because when fitting the first time with the small dataset the transformers are told that there will be those columns(.).因为当第一次用小数据集拟合时,变换器被告知会有那些列(。)。 Once they had the same columns (and column order) it worked, It seems this is the right way to only train one specific estimator.一旦他们拥有相同的列(和列顺序),它就起作用了,这似乎是只训练一个特定估计器的正确方法。 but please let me know if there is a better way or you think I am wrong.但请让我知道是否有更好的方法,或者您认为我错了。

So, it seems that the individual classifiers are stored in a list that can be accessed with .estimators_ .因此,似乎各个分类器存储在可以使用.estimators_访问的列表中。 The individual entries of this list are classifiers that have the .fit method.此列表的各个条目是具有.fit方法的分类器。 So, example with logistic regression:因此,以逻辑回归为例:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import VotingClassifier

X1, y1 = make_classification(random_state=1)
X2, y2 = make_classification(random_state=2)


clf1 = LogisticRegression(random_state=1)
clf2 = LogisticRegression(random_state=2)
clf3 = LogisticRegression(random_state=3)


voting = VotingClassifier(estimators=[
    ('a', clf1),
    ('b', clf2),
    ('c', clf3),
])

# fit all
voting = voting.fit(X1,y1)

# fit individual one
voting.estimators_[-1].fit(X2,y2)
voting.predict(X2)

edit: difference between estimators and estimators_编辑: estimatorsestimators_之间的区别_

.estimators .estimators

This is a list of tuples, with the form (name, estimator):这是一个元组列表,格式为 (name, estimator):

for e in voting.estimators:
    print(e)

('a', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=1, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False))
('b', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=2, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False))
('c', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=3, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False))

.estimators_ .estimators_

This is just a list of estimators, without the names.:这只是一个估算器列表,没有名称。:

for e in voting.estimators_:
    print(e)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=1, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=2, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=3, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)

Interestingly,有趣的是,

though,尽管,

voting.estimators[0][1] == voting.estimators_[0] evaluates to False , so the entries do not seem to be the same. voting.estimators[0][1] == voting.estimators_[0]评估为False ,因此条目似乎不一样。

the predict method of the voting classifier uses the .estimators_ list.投票分类器的预测方法使用.estimators_列表。

check lines 295 - 323 of the source检查源代码的第 295 - 323 行

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM