简体   繁体   中英

Machine learning parameter tuning using partitioned benchmark dataset

I know this will be very basic, however I'm really confused and I would like to understand parameter tuning better.

I'm working on a benchmark dataset that is already partitioned to three splits training, development, and testing and I would like to tune my classifier parameters using GridSearchCV from sklearn .

What is the correct partition to tune the parameter? is it the development or the training?

I've seen researchers in the literature mentioning that they " tuned the parameters using GridSearchCV on the development split " another example is found here ;

Do they mean they trained on the training split then tested on the development split? or do ML practitioners usually mean they perform the GridSearchCV entirely on the development split?

I'd really appreciate a clarification. Thanks,

Usually in a 3-way split you train a model using a training set, then you validate it on a development (which is also called validation set) set to tune hyperpameters and then after all the tuning is complete you perform a final evaluation of a model on an unseen before testing set (also known as evaluation set).

In a two-way split you just have a train set and a test set, so you perform tuning/evaluation on the same test set.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM