I have a large dataset, too large to fit into RAM, which is available either as HDF5 or CSV. How can I feed it into Keras in minibatches? Also, will this shuffle it for me, or do I need to pre-shuffle the dataset?
(I'm also interested in this when the input is a Numpy recarray; since Keras I believe wants the input to be a ndarray.)
And, if I want to do some lightweight preprocessing in Keras before learning (eg apply a few Python functions to the data to change the representation), hcan that be added?
Have a look at the fit_generator method available with Keras here: https://keras.io/models/sequential/#sequential-model-methods It fits the model on data generated batch-by-batch by a Python generator (Where you can write shuffling logic, since generator is under your control).
You may apply call pre-processing within the generator itself.
Hope this helps.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.