简体   繁体   中英

train and test set for a ML algorithm

I have a model which is trained on 33 datasets with SVM using LOOCV. I collected another 13 datasets which I divide like leave one out. In the testing phase, I combine datasets from training (33) and 12 from test and have a model which is trained on 45 datasets and test on the remaining datasets iteratively (similar to LOOCV). Is this method of testing right? All the recordings are independent of each other and can be reoffered as IID.

No, LOOCV is only used for small datasets or when you want an accurate estimate of your model performance.

Let's say your train accuracy is 90%, your test accuracy may be 50%.
This is due to overfitting from the large train size and small test size.
Image of overfitting in ML models

Assuming your 45 dataset sizes are the same, your train test size will be 98% - 2% .
The general rule of thumb for train test size is 80% - 20%

You could use train_test_split, k-fold split, stratifiedshufflesplit etc. instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM