简体   繁体   中英

Using a column for cross validation folds

I have a dataset with more than 100k rows and about 1k columns including the target column for a binary classification prediction problem. I am using H2O GBM (latest 3.30xx) in python with 5 folds cross validation and 80-20 train-test split. I have noticed that H2O is automatically stratifying it which is good. The problem I have is, I have this whole dataset from one product with some sub-products within it as a separate column or group. Each of these sub-product has decent size of 5k to 10k rows and therefore good to check separate model on each of them I thought. I am looking for if I can specify this sub-product groups for cross validation in H2O model training. Currently I am looping over these sub-products while doing a train-test split as it is not clear to me how to do it otherwise based on the document I have read so far. Is there any option I can use within H2O to have this sub-product column directly for cross validation? That way I have to control less all the model outputs in my scripts.
I hope the question is clear. If not, let me know. Thank you.

fold_column option works, some brief examples are there in the docs: http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/modeling.html#h2o.grid.H2OGridSearch

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM