I am doing multi-label classification using scikit learn. I am using RandomForestClassifier as the base estimator. I want to optimize the parameters of it for each label using GridSearchCV. Currently I am doing it in the following way:
from sklearn.ensemble import RandomForestClassifier
from sklearn.multiclass import OneVsRestClassifier
from sklearn.grid_search import GridSearchCV
parameters = {
"estimator__n_estimators": [5, 50, 200],
"estimator__max_depth" : [None, 10,20],
"estimator__min_samples_split" : [2, 5, 10],
}
model_to_tune = OneVsRestClassifier(RandomForestClassifier(random_state=0,class_weight='auto'))
model_tuned = GridSearchCV(model_to_tune, param_grid=params, scoring='f1',n_jobs=2)
print model_tuned.best_params_
{'estimator__min_samples_split': 10, 'estimator__max_depth': None, 'estimator__n_estimators': 200}
These are the parameters which gives the best f1 score considering all the labels. I want to find the parameters separately for each label. Is there any built in function which can do that?
It's not hard to do that, though it is not built-in and I'm not sure I understand why you would want to.
Simply pre-process your data like so:
for a_class in list_of_unique_classes:
y_this_class = (y_all_class==a_class)
model_to_tune = RandomForestClassifier(random_state=0,class_weight='auto')
model_tuned = GridSearchCV(model_to_tune, param_grid=params, scoring='f1',n_jobs=2)
model_tuned.fit( X, y_this_class )
# Save the best parameters for this class
(Also, beware f1 score, it does not do a good job of describing performance of a classifier for skewed data sets. You want to use ROC curves and/or informedness ).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.