简体   繁体   中英

What method does the sklearn VotingClassifier fit use?

The official document does not seems to provide the info.

I am wondering why we can't provide the VotingClassifier the already trained models so we do not need to train again since the VotingClassifier require us to call the fit method before predicting.

Does it just do:

for clf in self.clfs:
    clf.fit(X, y)

or does it use some more interesting folding method?

Here's what VotingClassifier.fit does:

def fit(self, X, y, sample_weight=None):
    ...  # Validates the arguments, estimators, etc.

    self.le_ = LabelEncoder()
    self.le_.fit(y)
    self.classes_ = self.le_.classes_
    self.estimators_ = []

    transformed_y = self.le_.transform(y)

    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
            delayed(_parallel_fit_estimator)(clone(clf), X, transformed_y,
                sample_weight)
                for _, clf in self.estimators)

    return self

... where _parallel_fit_estimator is just a wrapper over estimator.fit call:

def _parallel_fit_estimator(estimator, X, y, sample_weight):
    if sample_weight is not None:
        estimator.fit(X, y, sample_weight)
    else:
        estimator.fit(X, y)
    return estimator

As you can see, the method indeed fits the classifiers (in parallel!) and creates the label encoder self.le_ and self.estimators_ attributes. The predict() or transform() methods are built on top of these attributes, that's why calling fit() first is necessary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM