简体   繁体   中英

GridSearchCV scoring and grid_scores_

I am trying to understand how to obtain the values of the scorer for the GridSearchCV . The example code below sets up a small pipeline on text data.

Then it sets up a grid search over different ngrams.

The scoring is done through the f1 measure:

#setup the pipeline
tfidf_vec = TfidfVectorizer(analyzer='word', min_df=0.05, max_df=0.95)
linearsvc = LinearSVC()
clf = Pipeline([('tfidf_vec', tfidf_vec), ('linearsvc', linearsvc)])

# setup the grid search
parameters = {'tfidf_vec__ngram_range': [(1, 1), (1, 2)]}
gs_clf = GridSearchCV(clf, parameters, n_jobs=-1, scoring='f1')
gs_clf = gs_clf.fit(docs_train, y_train)

Now I can print the scores with:

print gs_clf.grid_scores_

[mean: 0.81548, std: 0.01324, params: {'tfidf_vec__ngram_range': (1, 1)},
 mean: 0.82143, std: 0.00538, params: {'tfidf_vec__ngram_range': (1, 2)}]

print gs_clf.grid_scores_[0].cv_validation_scores

array([ 0.83234714,  0.8       ,  0.81409002])

It is not clear to me from the documentation :

  1. Is gs_clf.grid_scores_[0].cv_validation_scores an array with the scores defined through the scoring parameter, per fold (in this case, the f1 measures per fold) ? If not, what is it then?

  2. If I instead choose another metric , such as scoring='f1_micro', each array in gs_clf.grid_scores_[i].cv_validation_scores will contain the f1_micro metric for the folds for a particular grid search parameter selection?

I wrote the following function to convert a grid_scores_ object to a pandas.DataFrame . Hopefully, the dataframe view will help clear up your confusion, as it's a more intuitive format:

def grid_scores_to_df(grid_scores):
    """
    Convert a sklearn.grid_search.GridSearchCV.grid_scores_ attribute to a tidy
    pandas DataFrame where each row is a hyperparameter-fold combinatination.
    """
    rows = list()
    for grid_score in grid_scores:
        for fold, score in enumerate(grid_score.cv_validation_scores):
            row = grid_score.parameters.copy()
            row['fold'] = fold
            row['score'] = score
            rows.append(row)
    df = pd.DataFrame(rows)
    return df

You'll have to have the following import for this to work: import pandas as pd .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM