简体   繁体   中英

Manual k-fold cross validation for Random Forest

I am using a Random Forest Classifier and I want to perform k-fold cross validation. My dataset is already split in 10 different subsets, so I'd like to use them to do k-fold cross validation, without using automatic functions that randomly split the dataset. Is it possible in Python?

Random Forest doesn't have the partial_fit() method, so I can't do an incremental fit.

try kf = StratifiedKFold(n_splits=3, shuffle=True, random_state=123) to evenly split your data

try kf=TimeSeriesSpit(n_splits=5) to split by time stamp try kf = KFold(n_splits=5, random_state=123, shuffle=True) to shuffle your training data before splitting.

for train_index, test_index in kf.split(bryant_shots):
     cv_train, cv_test = df.iloc[train_index], df.iloc[test_index]

     #fit the classifier

you can also stratefy by groupings or categories and get mean averages for these groupings using kfold. It is super powerful for understanding your data.

It is best to join all subsets and then split them for k-fold but here is the other way:

for in range(10):
   model = what_model_you_want
   model.fit(dataset.drop(i_th_subset))
   prediction = model.predict(i_th_subset)
   test_result = compute_accuracy(i_th_subset.target, prediction)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM