简体   繁体   English

如何对庞大的数据集进行交叉验证和超参数调整?

[英]how to do cross validation and hyper parameter tuning for huge dataset?

I have a csv file of 10+gb ,i used "chunksize" parameter available in the pandas.read_csv() to read and pre-process the data,for training the model want to use one of the online learning algo. 我有一个10 + gb的csv文件,我在pandas.read_csv()中使用了“ chunksize”参数来读取和预处理数据,以训练该模型要使用一种在线学习算法。

normally cross-validation and hyper-parameter tuning is done on the entire training data set and train the model using the best hyper-parameter,but in the case of the huge data, if i do the same on the chunk of the training data how to choose the hyper-parameter? 通常在整个训练数据集上进行交叉验证和超参数调整,并使用最佳超参数训练模型,但是在海量数据的情况下,如果我对训练数据块进行相同的操作选择超参数?

I believe you are looking for online learning algorithms like the ones mentioned on this link Scaling Strategies for large datasets . 我相信您正在寻找在线学习算法,例如本链接针对大型数据集的缩放策略中提到的算法。 You should use algorithms that support partial_fit parameter to load these large datasets in chunks. 您应该使用支持partial_fit参数的算法来分块加载这些大型数据集。 You can also look at the following links to see which one helps you the best, since you haven't specified the exact problem or the algorithm that you are working on: 您还可以查看以下链接,以查看哪一个对您有最大的帮助,因为您尚未指定确切的问题或正在使用的算法:

EDIT : If you want to solve the class imbalance problem, you can try this : imabalanced-learn library in Python 编辑 :如果您想解决类不平衡问题,可以尝试一下: python中的imabalanced-learn库

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用Sklearn的管道进行参数调整/交叉验证? - How do to parameter tuning/cross-validation with Sklearn's pipeline? 使用前馈神经网络进行超参数调整和过拟合 - Mini-Batch Epoch 和交叉验证 - Hyper-parameter tuning and Over-fitting with Feed-Forward Neural Network - Mini-Batch Epoch and Cross Validation 如何修复重塑数据集以进行交叉验证? - How do i fix reshaping my dataset for cross validation? 在超参数调整期间分数保持不变 - Score remains same during hyper parameter tuning 机器学习的超参数调优 model - Hyper-parameter Tuning for a machine learning model 在 xgboost 中对 XGBRegressor 进行超参数调整时出错 - Error while hyper parameter tuning of XGBRegressor in xgboost 使用 AI Platform 超参数调优功能时如何强制参数依赖? - How to force parameter dependency when using AI Platform hyper parameter tuning capability? Scikit学习:超参数调整后,对整个数据集使用交叉验证 - Scikit-learn: use cross-validation on whole dataset after hyperparameters tuning 如何使用 GridSearchCV 比较多个模型以及 python 中的管道和超参数调整 - How to use GridSearchCV for comparing multiple models along with pipeline and hyper-parameter tuning in python 在超参数调整期间,简单参数是否也会更改 - Does the simple parameters also change during Hyper-parameter tuning
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM