简体   繁体   中英

Does cross_val_score take sequential samples or random samples?

In this: cross_val_score(GaussianNB(),features,target, cv=10)

Are we splitting the data randomly into 10 or is it done sequentially?

This depends on what you specify in the cv parameter.

If the independent variable is binary or multiclass it will use StratifiedKFold, else it will use KFold. You can also override the options by specifying a function (sklearn or otherwise) to perform the splits.

The KFold function will divide the data into sequential folds. If you want it to do a random split, you can set the shuffle parameter to True. If you want to fix the random shuffle you can set a value for the random_state. If you do not, it will take a random value and the folds will be different every time you run the function.

For StratifiedKFold, it will split the data while attempting to keep the same ratio of classes of the dependent variable in each split. Because of this, there can be slight changes every time you call the function. ie It will not be sequential by default.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM