简体繁体中英

how to do cross validation and hyper parameter tuning for huge dataset?

原文 2017-09-26 05:27:40 4 1 python/ machine-learning/ scikit-learn/ data-science

I have a csv file of 10+gb ,i used "chunksize" parameter available in the pandas.read_csv() to read and pre-process the data,for training the model want to use one of the online learning algo.

normally cross-validation and hyper-parameter tuning is done on the entire training data set and train the model using the best hyper-parameter,but in the case of the huge data, if i do the same on the chunk of the training data how to choose the hyper-parameter?

1 answers

I believe you are looking for online learning algorithms like the ones mentioned on this link Scaling Strategies for large datasets . You should use algorithms that support partial_fit parameter to load these large datasets in chunks. You can also look at the following links to see which one helps you the best, since you haven't specified the exact problem or the algorithm that you are working on:

EDIT : If you want to solve the class imbalance problem, you can try this : imabalanced-learn library in Python

How do to parameter tuning/cross-validation with Sklearn's pipeline?

Hyper-parameter tuning and Over-fitting with Feed-Forward Neural Network - Mini-Batch Epoch and Cross Validation

How do i fix reshaping my dataset for cross validation?

Score remains same during hyper parameter tuning

Hyper-parameter Tuning for a machine learning model

Error while hyper parameter tuning of XGBRegressor in xgboost

How to force parameter dependency when using AI Platform hyper parameter tuning capability?

Scikit-learn: use cross-validation on whole dataset after hyperparameters tuning

How to use GridSearchCV for comparing multiple models along with pipeline and hyper-parameter tuning in python

Does the simple parameters also change during Hyper-parameter tuning

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How do to parameter tuning/cross-validation with Sklearn's pipeline? Hyper-parameter tuning and Over-fitting with Feed-Forward Neural Network - Mini-Batch Epoch and Cross Validation How do i fix reshaping my dataset for cross validation? Score remains same during hyper parameter tuning Hyper-parameter Tuning for a machine learning model Error while hyper parameter tuning of XGBRegressor in xgboost How to force parameter dependency when using AI Platform hyper parameter tuning capability? Scikit-learn: use cross-validation on whole dataset after hyperparameters tuning How to use GridSearchCV for comparing multiple models along with pipeline and hyper-parameter tuning in python Does the simple parameters also change during Hyper-parameter tuning

Related Tags

how to do cross validation and hyper parameter tuning for huge dataset?

Question

1 answers

solution1 0 ACCPTED 2017-09-26 06:13:13

solution1
0 ACCPTED 2017-09-26 06:13:13