Manual k-fold cross validation for Random Forest

Question

I am using a Random Forest Classifier and I want to perform k-fold cross validation. My dataset is already split in 10 different subsets, so I'd like to use them to do k-fold cross validation, without using automatic functions that randomly split the dataset. Is it possible in Python?

Random Forest doesn't have the partial_fit() method, so I can't do an incremental fit.

Answer 1

try kf = StratifiedKFold(n_splits=3, shuffle=True, random_state=123) to evenly split your data

try kf=TimeSeriesSpit(n_splits=5) to split by time stamp try kf = KFold(n_splits=5, random_state=123, shuffle=True) to shuffle your training data before splitting.

for train_index, test_index in kf.split(bryant_shots):
     cv_train, cv_test = df.iloc[train_index], df.iloc[test_index]

     #fit the classifier

you can also stratefy by groupings or categories and get mean averages for these groupings using kfold. It is super powerful for understanding your data.

Answer 2

It is best to join all subsets and then split them for k-fold but here is the other way:

for in range(10):
   model = what_model_you_want
   model.fit(dataset.drop(i_th_subset))
   prediction = model.predict(i_th_subset)
   test_result = compute_accuracy(i_th_subset.target, prediction)

Manual k-fold cross validation for Random Forest

Question

2 answers

solution1
0 2021-03-08 17:56:19

solution2
0 2021-03-12 19:58:50

Manual k-fold cross validation for Random Forest

Question

2 answers

solution1 0 2021-03-08 17:56:19

solution2 0 2021-03-12 19:58:50

solution1
0 2021-03-08 17:56:19

solution2
0 2021-03-12 19:58:50