简体   繁体   English

Keras:从HDF5和CSV加载迷你批处理

[英]Keras: Loading minibatches from HDF5 and CSV

I have a large dataset, too large to fit into RAM, which is available either as HDF5 or CSV. 我有一个大型数据集,太大而无法放入RAM,可以HDF5或CSV格式提供。 How can I feed it into Keras in minibatches? 如何以小批量的方式将其喂入Keras? Also, will this shuffle it for me, or do I need to pre-shuffle the dataset? 另外,这会为我洗牌还是我需要对数据集进行洗牌?

(I'm also interested in this when the input is a Numpy recarray; since Keras I believe wants the input to be a ndarray.) (当输入是Numpy数组时,我也对此感兴趣;由于Keras,我相信希望输入是ndarray。)

And, if I want to do some lightweight preprocessing in Keras before learning (eg apply a few Python functions to the data to change the representation), hcan that be added? 并且,如果我想在学习之前在Keras中进行一些轻量级的预处理(例如,将一些Python函数应用于数据以更改表示形式),是否可以添加?

Have a look at the fit_generator method available with Keras here: https://keras.io/models/sequential/#sequential-model-methods It fits the model on data generated batch-by-batch by a Python generator (Where you can write shuffling logic, since generator is under your control). 在此处查看Keras可用的fit_generator方法: https ://keras.io/models/sequential/#sequential-model-methods它将模型拟合到Python生成器逐批生成的数据上(您可以在此处编写改组逻辑,因为生成器在您的控制之下)。

You may apply call pre-processing within the generator itself. 您可以在生成器本身中应用呼叫预处理。

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM