简体   繁体   中英

Kfold cross validation in sklearn gives different folds each time

I want to implement KFold cross validation on my model. Since I want to share my results with others, I want to have fixed results each time. I am using an xgboost model as my classification model. However, each time I run my code, my performance metrics give different results each time and I am confused because I set the shuffle parameter to False . Also, I am unsure what the random_state parameter does (I read the documentation), but regardless I tried setting that to a fixed number with shuffle=False, and that did not help.

kf = KFold(n_splits=5, shuffle = False)

for train_index, test_index in kf.split(X, y):
    X_train, X_test = X.iloc[train_index], X.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]
    xgb = XGBClassifier(max_depth = 4)
    ...fit, predict, and compute performance metrics

When passing a number in the parameter random_state you're fixing the seed for the internal random number generator. In the future if you set it to the same number again, the sequence of random numbers generated will always be the same. In this way you can guarantee the reproducibility of your results, just like how you wants.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM