简体繁体 English

Keras：从HDF5和CSV加载迷你批处理

[英]Keras: Loading minibatches from HDF5 and CSV

原文 2017-01-08 06:44:34 7 1 python/ neural-network/ keras

I have a large dataset, too large to fit into RAM, which is available either as HDF5 or CSV. 我有一个大型数据集，太大而无法放入RAM，可以HDF5或CSV格式提供。 How can I feed it into Keras in minibatches? 如何以小批量的方式将其喂入Keras？ Also, will this shuffle it for me, or do I need to pre-shuffle the dataset? 另外，这会为我洗牌还是我需要对数据集进行洗牌？

(I'm also interested in this when the input is a Numpy recarray; since Keras I believe wants the input to be a ndarray.) （当输入是Numpy数组时，我也对此感兴趣；由于Keras，我相信希望输入是ndarray。）

And, if I want to do some lightweight preprocessing in Keras before learning (eg apply a few Python functions to the data to change the representation), hcan that be added? 并且，如果我想在学习之前在Keras中进行一些轻量级的预处理（例如，将一些Python函数应用于数据以更改表示形式），是否可以添加？

1 个解决方案

Have a look at the fit_generator method available with Keras here: https://keras.io/models/sequential/#sequential-model-methods It fits the model on data generated batch-by-batch by a Python generator (Where you can write shuffling logic, since generator is under your control). 在此处查看Keras可用的fit_generator方法： https ://keras.io/models/sequential/#sequential-model-methods它将模型拟合到Python生成器逐批生成的数据上（您可以在此处编写改组逻辑，因为生成器在您的控制之下）。

You may apply call pre-processing within the generator itself. 您可以在生成器本身中应用呼叫预处理。

Hope this helps. 希望这可以帮助。