简体繁体中英

train and test set for a ML algorithm

原文 2022-05-30 08:05:01 1 1 testing/ leave-one-out

I have a model which is trained on 33 datasets with SVM using LOOCV. I collected another 13 datasets which I divide like leave one out. In the testing phase, I combine datasets from training (33) and 12 from test and have a model which is trained on 45 datasets and test on the remaining datasets iteratively (similar to LOOCV). Is this method of testing right? All the recordings are independent of each other and can be reoffered as IID.

1 answers

No, LOOCV is only used for small datasets or when you want an accurate estimate of your model performance.

Let's say your train accuracy is 90%, your test accuracy may be 50%.
This is due to overfitting from the large train size and small test size.
Image of overfitting in ML models

Assuming your 45 dataset sizes are the same, your train test size will be 98% - 2% .
The general rule of thumb for train test size is 80% - 20%

You could use train_test_split, k-fold split, stratifiedshufflesplit etc. instead.

weka: train and test set in different format (arff and text format)

WEKA Train and test set are not compatible when classifying boolean data

How to use split large dataset in train/test set but also use pandas batchsize itererations for updating

How to split test and train size

Test cases for algorithm puzzle

When to use Train Validation Test sets

KNN Matlab Train Test Cross-validation

Which one is better? To test by data or test by algorithm?

Non overlapping data in train test validation split python

train_test_split errors with two csv files

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question weka: train and test set in different format (arff and text format) WEKA Train and test set are not compatible when classifying boolean data How to use split large dataset in train/test set but also use pandas batchsize itererations for updating How to split test and train size Test cases for algorithm puzzle When to use Train Validation Test sets KNN Matlab Train Test Cross-validation Which one is better? To test by data or test by algorithm? Non overlapping data in train test validation split python train_test_split errors with two csv files

Related Tags

train and test set for a ML algorithm

Question

1 answers

solution1 1 2022-05-30 08:30:09

solution1
1 2022-05-30 08:30:09