简体   繁体   中英

hyperparameter tuning using GridSearchCV

I have a K nearest neighbour classifier which you can see below. From what I understand, the GridSearchCV is testing the model with different values of k between 1-20. When I do y_pred=knn_grid_cv.predict(x_test) I get a bunch of y predictions, but what value k (between 1-20) was used to obtain these y predictions? Would it be the highest scoring k value from the GridSearchCV?

x=football_df["Pace"].values.reshape(-1, 1)
print(x)
y=football_df["Position"].values.reshape(-1, 1)  

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.4,random_state=42)

param_grid={"n_neighbors":np.arange(1,20)}  
knn = KNeighborsClassifier()
knn_grid_cv = GridSearchCV(knn, param_grid, cv=5)
knn_grid_cv.fit(x_train,y_train)
y_pred=knn_grid_cv.predict(x_test)
print(y_pred)

You are correct. The way you defined param_grid will test the performance of 20 different models, each with a different value for n_neighbors . The best model is chosen as the one with the highest average cross-validated score. In the case of a KNeighborsClassifier , the default score metric used is the mean accuracy.

In your case, that'd be the model with the highest mean accuracy across all five splits.

To see what value of n_neighbors was chosen, simply do:

# Option 1: print the parameters of the best classifier
print(knn_grid_cv.best_estimator_.get_params())

# Option 2: print results of all model combinations
import pandas as pd
res = pd.DataFrame(knn_grid_cv.cv_results_)
print(res)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM