简体   繁体   中英

GridSearchCV for multi-label classification for each label separately

I am doing multi-label classification using scikit learn. I am using RandomForestClassifier as the base estimator. I want to optimize the parameters of it for each label using GridSearchCV. Currently I am doing it in the following way:

from sklearn.ensemble import RandomForestClassifier
from sklearn.multiclass import OneVsRestClassifier
from sklearn.grid_search import GridSearchCV

parameters = {
  "estimator__n_estimators": [5, 50, 200],
  "estimator__max_depth" : [None, 10,20],
  "estimator__min_samples_split" : [2, 5, 10],
}
model_to_tune = OneVsRestClassifier(RandomForestClassifier(random_state=0,class_weight='auto'))
model_tuned = GridSearchCV(model_to_tune, param_grid=params, scoring='f1',n_jobs=2)
print model_tuned.best_params_
{'estimator__min_samples_split': 10, 'estimator__max_depth': None, 'estimator__n_estimators': 200}

These are the parameters which gives the best f1 score considering all the labels. I want to find the parameters separately for each label. Is there any built in function which can do that?

It's not hard to do that, though it is not built-in and I'm not sure I understand why you would want to.

Simply pre-process your data like so:

for a_class in list_of_unique_classes:
    y_this_class = (y_all_class==a_class)
    model_to_tune = RandomForestClassifier(random_state=0,class_weight='auto')
    model_tuned = GridSearchCV(model_to_tune, param_grid=params, scoring='f1',n_jobs=2)
    model_tuned.fit( X, y_this_class )

    # Save the best parameters for this class

(Also, beware f1 score, it does not do a good job of describing performance of a classifier for skewed data sets. You want to use ROC curves and/or informedness ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM