Kfold cross validation in sklearn gives different folds each time

Question

I want to implement KFold cross validation on my model. Since I want to share my results with others, I want to have fixed results each time. I am using an xgboost model as my classification model. However, each time I run my code, my performance metrics give different results each time and I am confused because I set the shuffle parameter to False . Also, I am unsure what the random_state parameter does (I read the documentation), but regardless I tried setting that to a fixed number with shuffle=False, and that did not help.

kf = KFold(n_splits=5, shuffle = False)

for train_index, test_index in kf.split(X, y):
    X_train, X_test = X.iloc[train_index], X.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]
    xgb = XGBClassifier(max_depth = 4)
    ...fit, predict, and compute performance metrics

Answer 1

When passing a number in the parameter random_state you're fixing the seed for the internal random number generator. In the future if you set it to the same number again, the sequence of random numbers generated will always be the same. In this way you can guarantee the reproducibility of your results, just like how you wants.

Kfold cross validation in sklearn gives different folds each time

Question

1 answers

solution1
1 2018-10-13 00:45:43

Kfold cross validation in sklearn gives different folds each time

Question

1 answers

solution1 1 2018-10-13 00:45:43

solution1
1 2018-10-13 00:45:43