简体繁体中英

Using a column for cross validation folds

原文 2020-04-28 06:32:46 2 1 python/ machine-learning/ h2o

I have a dataset with more than 100k rows and about 1k columns including the target column for a binary classification prediction problem. I am using H2O GBM (latest 3.30xx) in python with 5 folds cross validation and 80-20 train-test split. I have noticed that H2O is automatically stratifying it which is good. The problem I have is, I have this whole dataset from one product with some sub-products within it as a separate column or group. Each of these sub-product has decent size of 5k to 10k rows and therefore good to check separate model on each of them I thought. I am looking for if I can specify this sub-product groups for cross validation in H2O model training. Currently I am looping over these sub-products while doing a train-test split as it is not clear to me how to do it otherwise based on the document I have read so far. Is there any option I can use within H2O to have this sub-product column directly for cross validation? That way I have to control less all the model outputs in my scripts.
I hope the question is clear. If not, let me know. Thank you.

1 answers

fold_column option works, some brief examples are there in the docs: http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/modeling.html#h2o.grid.H2OGridSearch

Nested cross validation with stratified folds

Is there a way to see the folds for cross-validation in GridSearchCV?

How to plot ROC_AUC curve for each folds in KFold Cross Validation using Keras Neural Network Classifier

Cross validation dataset folds for Random Forest feature importance

Kfold cross validation in sklearn gives different folds each time

Can I use a numpy array to generate folds for cross validation?

How to run scikit's cross validation with several classifiers on the same folds

Augmenting only the training set in K-folds cross validation

Custom folds for cross-validation in scikit-learn

How to implement n times repeated k-folds cross validation that yields n*k folds in sklearn?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Nested cross validation with stratified folds Is there a way to see the folds for cross-validation in GridSearchCV? How to plot ROC_AUC curve for each folds in KFold Cross Validation using Keras Neural Network Classifier Cross validation dataset folds for Random Forest feature importance Kfold cross validation in sklearn gives different folds each time Can I use a numpy array to generate folds for cross validation? How to run scikit's cross validation with several classifiers on the same folds Augmenting only the training set in K-folds cross validation Custom folds for cross-validation in scikit-learn How to implement n times repeated k-folds cross validation that yields n*k folds in sklearn?

Related Tags

Using a column for cross validation folds

Question

1 answers

solution1 0 2020-04-29 00:57:08

solution1
0 2020-04-29 00:57:08