[英]Is it possible to fit one specific estimator out of an ensemble votingclassifier?
這是我在這里的第一個問題,如果我做錯了什么,請告訴我!
所以我使用 sklearn 構建了一個包含 3 個不同估計器的集成投票分類器。 我首先通過調用將所有 3 個數據與相同的數據相匹配: est.fit()
第一個數據集很小,因為 3 個估計器中有 2 個擬合非常耗時。
現在我想用不同的數據再次擬合第三個估計器。 有沒有辦法做到這一點?
我嘗試像這樣訪問估計器: ens.estimators_[2].fit(X_largedata, y_largedata)
這不會引發錯誤,但我不確定這是否適合估計器的副本或實際上是整體一部分的那個。
現在調用ens.predict(X_test)
會導致以下錯誤:(如果我不嘗試擬合第三個估計器,預測工作正常)
ValueError Traceback (most recent call last)
<ipython-input-438-65c955f40b01> in <module>
----> 1 pred_ens2 = ens.predict(X_test_ens2)
2 print(ens.score(X_test_ens2, y_test_ens2))
3 confusion_matrix(pred_ens2, y_test_ens2).ravel()
~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in predict(self, X)
280 check_is_fitted(self)
281 if self.voting == 'soft':
--> 282 maj = np.argmax(self.predict_proba(X), axis=1)
283
284 else: # 'hard' voting
~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in _predict_proba(self, X)
300 """Predict class probabilities for X in 'soft' voting."""
301 check_is_fitted(self)
--> 302 avg = np.average(self._collect_probas(X), axis=0,
303 weights=self._weights_not_none)
304 return avg
~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in _collect_probas(self, X)
295 def _collect_probas(self, X):
296 """Collect results from clf.predict calls."""
--> 297 return np.asarray([clf.predict_proba(X) for clf in self.estimators_])
298
299 def _predict_proba(self, X):
~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/ensemble/_voting.py in <listcomp>(.0)
295 def _collect_probas(self, X):
296 """Collect results from clf.predict calls."""
--> 297 return np.asarray([clf.predict_proba(X) for clf in self.estimators_])
298
299 def _predict_proba(self, X):
~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/utils/metaestimators.py in <lambda>(*args, **kwargs)
117
118 # lambda, but not partial, allows help() to work with update_wrapper
--> 119 out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
120 # update the docstring of the returned function
121 update_wrapper(out, self.fn)
~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/pipeline.py in predict_proba(self, X)
461 Xt = X
462 for _, name, transform in self._iter(with_final=False):
--> 463 Xt = transform.transform(Xt)
464 return self.steps[-1][-1].predict_proba(Xt)
465
~/jupyter/lexical/lexical_env/lib/python3.7/site-packages/sklearn/compose/_column_transformer.py in transform(self, X)
596 if (n_cols_transform >= n_cols_fit and
597 any(X.columns[:n_cols_fit] != self._df_columns)):
--> 598 raise ValueError('Column ordering must be equal for fit '
599 'and for transform when using the '
600 'remainder keyword')
ValueError: Column ordering must be equal for fit and for transform when using the remainder keyword
編輯:我修復了錯誤。 這是由於小數據集的列多於大數據集引起的,這可能是一個問題? 因為當第一次用小數據集擬合時,變換器被告知會有那些列(。)。 一旦他們擁有相同的列(和列順序),它就起作用了,這似乎是只訓練一個特定估計器的正確方法。 但請讓我知道是否有更好的方法,或者您認為我錯了。
因此,似乎各個分類器存儲在可以使用.estimators_
訪問的列表中。 此列表的各個條目是具有.fit
方法的分類器。 因此,以邏輯回歸為例:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import VotingClassifier
X1, y1 = make_classification(random_state=1)
X2, y2 = make_classification(random_state=2)
clf1 = LogisticRegression(random_state=1)
clf2 = LogisticRegression(random_state=2)
clf3 = LogisticRegression(random_state=3)
voting = VotingClassifier(estimators=[
('a', clf1),
('b', clf2),
('c', clf3),
])
# fit all
voting = voting.fit(X1,y1)
# fit individual one
voting.estimators_[-1].fit(X2,y2)
voting.predict(X2)
estimators
和estimators_
之間的區別_這是一個元組列表,格式為 (name, estimator):
for e in voting.estimators:
print(e)
('a', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=1, solver='warn', tol=0.0001, verbose=0,
warm_start=False))
('b', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=2, solver='warn', tol=0.0001, verbose=0,
warm_start=False))
('c', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=3, solver='warn', tol=0.0001, verbose=0,
warm_start=False))
這只是一個估算器列表,沒有名稱。:
for e in voting.estimators_:
print(e)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=1, solver='warn', tol=0.0001, verbose=0,
warm_start=False)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=2, solver='warn', tol=0.0001, verbose=0,
warm_start=False)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=3, solver='warn', tol=0.0001, verbose=0,
warm_start=False)
盡管,
voting.estimators[0][1] == voting.estimators_[0]
評估為False
,因此條目似乎不一樣。
投票分類器的預測方法使用.estimators_
列表。
檢查源代碼的第 295 - 323 行
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.