简体   繁体   English

如何在Keras中训练HDF5文件中存储的数据的神经网络?

[英]How do I train a neural network in Keras on data stored in HDF5 files?

I have two fairly large PyTables EArrays which contain the inputs and labels for a regression task. 我有两个相当大的PyTables EArray,其中包含回归任务的输入和标签。 The input array is 4d (55k x 128 x 128 x 3) and the label array is 1d (55k). 输入数组为4d(55k x 128 x 128 x 3),标签数组为1d(55k)。 I have a NN architecture specified in Keras which I want to train on this data, but there are two problems. 我有一个在Keras中指定的NN体系结构,我想在此数据上进行训练,但是有两个问题。

  1. The input array at least is too large to fit in memory at once. 输入数组至少太大,无法立即放入内存。
  2. I only want to train on some random subset of the full data, since I want to take train, test, and validation splits. 我只想对全部数据的某个随机子集进行训练,因为我想进行训练,测试和验证拆分。 I select the splits by slicing on random subsets of the indices. 我通过对索引的随机子集进行切片来选择拆分。

How can I select subsets of the HDF5 arrays (input and output) according to train/test indices and train on the training subsets, without reading them into memory all at once? 如何根据训练/测试索引选择HDF5阵列的子集(输入和输出)并训练训练子集,而不必一次将它们全部读入内存? Is there some way to create a "view" of the on-disk array that can be sliced and that Keras will see as a regular NumPy ndarray? 有什么方法可以创建磁盘切片的“视图”,并且可以将Keras视为常规NumPy ndarray?

What I've tried so far is to convert my arrays to Keras HDF5Matrix objects (with eg X = keras.utils.io_utils.HDF5Matrix(X) ), but when I then slice this to get a training split, the full slice (80% of the full array) gets put into memory, which gives me a MemoryError . 到目前为止,我一直在尝试将数组转换为Keras HDF5Matrix对象(例如X = keras.utils.io_utils.HDF5Matrix(X) ),但是当我对其进行切片以进行训练分割时,将整个切片(80整个数组的%)放入内存,这给了我一个MemoryError

You can use the fit_generator method of your keras model. 您可以使用keras模型的fit_generator方法。 Just write your own generator class/function that pulls random batches of samples from your HDF5 file. 只需编写您自己的生成器类/函数,即可从HDF5文件中提取随机批次的样本。 That way, you never have to have all the data in memory at once. 这样,您就不必一次将所有数据存储在内存中。 Similarly, if your validation data are too large to fit in memory, the validation_data argument to fit_generator also accepts a generator that produces batches from your validation data. 同样,如果您的验证数据太大而无法容纳在内存中,则fit_generatorvalidation_data参数也会接受一个生成器,该生成器会根据您的验证数据生成批处理。

Essentially, you just need to do an np.random.shuffle on an array of indices into your data set, then split the random index array into training, validation, and testing array indices. 本质上,您只需要对一组索引数组执行np.random.shuffle到数据集中,然后将随机索引数组拆分为训练,验证和测试数组索引。 Your generator arguments to fit_generator will just pull batches from your HDF5 file according to sequential batches of indices in the training and validation index arrays. 您针对fit_generator生成器参数fit_generator根据训练和验证索引数组中的索引的连续批次从HDF5文件中提取批次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何读取 .hdf5 数据文件作为卷积神经网络的输入? - how to read .hdf5 datafile as input to a convolutional neural network? 如何在 python 中使用 keras 训练具有列表数组的神经网络 - How do I train a neural network with an array of list using keras in python 如何在python中安全地将单个hdf5文件中的数据并行写入多个文件? - how do I safely write data from a single hdf5 file to multiple files in parallel in python? HDF5中存储的数据尺寸 - Dimensions of data stored in HDF5 如何在keras中同时训练多个神经网络? - How do I train multiple neural nets simultaneously in keras? 如何使用 tensorflow 数据集训练神经网络? - How do I train a neural network with tensorflow-datasets? 如何在不同文件中运行多个 Keras 神经网络模型? - How do I run several Keras neural network models in different files? 如何使用Keras训练和调整人工多层感知器神经网络? - How to train and tune an artificial multilayer perceptron neural network using Keras? 我怎么知道我的神经网络模型是否过度拟合(Keras) - How do I know if my Neural Network model is overfitting or not (Keras) 如何在keras中建立置换不变性神经网络? - How do I build a permutation invariance neural network in keras?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM