[英]Is it possible to fit one specific estimator out of an ensemble votingclassifier?
This is my first Question here, please let me know if I am doing something wrong!这是我在这里的第一个问题,如果我做错了什么,请告诉我!
So I used sklearn to build an ensemble votingclassifier that contains 3 different estimators.所以我使用 sklearn 构建了一个包含 3 个不同估计器的集成投票分类器。 I first fit all 3 with the same data by calling: est.fit()
我首先通过调用将所有 3 个数据与相同的数据相匹配: est.fit()
This first dataset is small because 2 out of the 3 estimators fitting is very time-consuming.第一个数据集很小,因为 3 个估计器中有 2 个拟合非常耗时。
Now I want to fit the third estimator again with different data.现在我想用不同的数据再次拟合第三个估计器。 Is there a way to achieve this?有没有办法做到这一点?
I tryed accessing the estimator like this: ens.estimators_[2].fit(X_largedata, y_largedata)
我尝试像这样访问估计器: ens.estimators_[2].fit(X_largedata, y_largedata)
This does not throw an error but i am not sure if this is fitting a copy of the estimator or the one thats actually part of the ensemble.这不会引发错误,但我不确定这是否适合估计器的副本或实际上是整体一部分的那个。
Calling ens.predict(X_test)
after now results in the following error: (predict works fine if i dont try to fit the 3rd estimator)现在调用ens.predict(X_test)
会导致以下错误:(如果我不尝试拟合第三个估计器,预测工作正常)
ValueError Traceback (most recent call last)
<ipython-input-438-65c955f40b01> in <module>
----> 1 pred_ens2 = ens.predict(X_test_ens2)
2 print(ens.score(X_test_ens2, y_test_ens2))
3 confusion_matrix(pred_ens2, y_test_ens2).ravel()
~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in predict(self, X)
280 check_is_fitted(self)
281 if self.voting == 'soft':
--> 282 maj = np.argmax(self.predict_proba(X), axis=1)
283
284 else: # 'hard' voting
~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in _predict_proba(self, X)
300 """Predict class probabilities for X in 'soft' voting."""
301 check_is_fitted(self)
--> 302 avg = np.average(self._collect_probas(X), axis=0,
303 weights=self._weights_not_none)
304 return avg
~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in _collect_probas(self, X)
295 def _collect_probas(self, X):
296 """Collect results from clf.predict calls."""
--> 297 return np.asarray([clf.predict_proba(X) for clf in self.estimators_])
298
299 def _predict_proba(self, X):
~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in <listcomp>(.0)
295 def _collect_probas(self, X):
296 """Collect results from clf.predict calls."""
--> 297 return np.asarray([clf.predict_proba(X) for clf in self.estimators_])
298
299 def _predict_proba(self, X):
~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/utils/metaestimators.py in <lambda>(*args, **kwargs)
117
118 # lambda, but not partial, allows help() to work with update_wrapper
--> 119 out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
120 # update the docstring of the returned function
121 update_wrapper(out, self.fn)
~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/pipeline.py in predict_proba(self, X)
461 Xt = X
462 for _, name, transform in self._iter(with_final=False):
--> 463 Xt = transform.transform(Xt)
464 return self.steps[-1][-1].predict_proba(Xt)
465
~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py in transform(self, X)
596 if (n_cols_transform >= n_cols_fit and
597 any(X.columns[:n_cols_fit] != self._df_columns)):
--> 598 raise ValueError('Column ordering must be equal for fit '
599 'and for transform when using the '
600 'remainder keyword')
ValueError: Column ordering must be equal for fit and for transform when using the remainder keyword
EDIT: I fixed the error.编辑:我修复了错误。 It was caused by the small dataset having more columns than the big one, This probably is a problem?这是由于小数据集的列多于大数据集引起的,这可能是一个问题? because when fitting the first time with the small dataset the transformers are told that there will be those columns(.).因为当第一次用小数据集拟合时,变换器被告知会有那些列(。)。 Once they had the same columns (and column order) it worked, It seems this is the right way to only train one specific estimator.一旦他们拥有相同的列(和列顺序),它就起作用了,这似乎是只训练一个特定估计器的正确方法。 but please let me know if there is a better way or you think I am wrong.但请让我知道是否有更好的方法,或者您认为我错了。
So, it seems that the individual classifiers are stored in a list that can be accessed with .estimators_
.因此,似乎各个分类器存储在可以使用.estimators_
访问的列表中。 The individual entries of this list are classifiers that have the .fit
method.此列表的各个条目是具有.fit
方法的分类器。 So, example with logistic regression:因此,以逻辑回归为例:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import VotingClassifier
X1, y1 = make_classification(random_state=1)
X2, y2 = make_classification(random_state=2)
clf1 = LogisticRegression(random_state=1)
clf2 = LogisticRegression(random_state=2)
clf3 = LogisticRegression(random_state=3)
voting = VotingClassifier(estimators=[
('a', clf1),
('b', clf2),
('c', clf3),
])
# fit all
voting = voting.fit(X1,y1)
# fit individual one
voting.estimators_[-1].fit(X2,y2)
voting.predict(X2)
estimators
and estimators_
编辑: estimators
和estimators_
之间的区别_This is a list of tuples, with the form (name, estimator):这是一个元组列表,格式为 (name, estimator):
for e in voting.estimators:
print(e)
('a', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=1, solver='warn', tol=0.0001, verbose=0,
warm_start=False))
('b', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=2, solver='warn', tol=0.0001, verbose=0,
warm_start=False))
('c', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=3, solver='warn', tol=0.0001, verbose=0,
warm_start=False))
This is just a list of estimators, without the names.:这只是一个估算器列表,没有名称。:
for e in voting.estimators_:
print(e)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=1, solver='warn', tol=0.0001, verbose=0,
warm_start=False)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=2, solver='warn', tol=0.0001, verbose=0,
warm_start=False)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=3, solver='warn', tol=0.0001, verbose=0,
warm_start=False)
though,尽管,
voting.estimators[0][1] == voting.estimators_[0]
evaluates to False
, so the entries do not seem to be the same. voting.estimators[0][1] == voting.estimators_[0]
评估为False
,因此条目似乎不一样。
the predict method of the voting classifier uses the .estimators_
list.投票分类器的预测方法使用.estimators_
列表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.