I have a K nearest neighbour classifier which you can see below. From what I understand, the GridSearchCV is testing the model with different values of k between 1-20. When I do y_pred=knn_grid_cv.predict(x_test)
I get a bunch of y predictions, but what value k (between 1-20) was used to obtain these y predictions? Would it be the highest scoring k value from the GridSearchCV?
x=football_df["Pace"].values.reshape(-1, 1)
print(x)
y=football_df["Position"].values.reshape(-1, 1)
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.4,random_state=42)
param_grid={"n_neighbors":np.arange(1,20)}
knn = KNeighborsClassifier()
knn_grid_cv = GridSearchCV(knn, param_grid, cv=5)
knn_grid_cv.fit(x_train,y_train)
y_pred=knn_grid_cv.predict(x_test)
print(y_pred)
You are correct. The way you defined param_grid
will test the performance of 20 different models, each with a different value for n_neighbors
. The best model is chosen as the one with the highest average cross-validated score. In the case of a KNeighborsClassifier
, the default score metric used is the mean accuracy.
In your case, that'd be the model with the highest mean accuracy across all five splits.
To see what value of n_neighbors
was chosen, simply do:
# Option 1: print the parameters of the best classifier
print(knn_grid_cv.best_estimator_.get_params())
# Option 2: print results of all model combinations
import pandas as pd
res = pd.DataFrame(knn_grid_cv.cv_results_)
print(res)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.