I used a template of a custom image generator in keras, so that I can use hdf5 files as input. Initially, the code was giving a "shape" error, so I only included from tensorflow.python.keras.utils.data_utils import Sequence
following this post . Now I use it in this form, as you can also see in my colab notebook :
from numpy.random import uniform, randint
from tensorflow.python.keras.preprocessing.image import ImageDataGenerator
import numpy as np
from tensorflow.python.keras.utils.data_utils import Sequence
class CustomImagesGenerator(Sequence):
def __init__(self, x, zoom_range, shear_range, rescale, horizontal_flip, batch_size):
self.x = x
self.zoom_range = zoom_range
self.shear_range = shear_range
self.rescale = rescale
self.horizontal_flip = horizontal_flip
self.batch_size = batch_size
self.__img_gen = ImageDataGenerator()
self.__batch_index = 0
def __len__(self):
# steps_per_epoch, if unspecified, will use the len(generator) as a number of steps.
# hence this
return np.floor(self.x.shape[0]/self.batch_size)
# @property
# def shape(self):
# return self.x.shape
def next(self):
return self.__next__()
def __next__(self):
start = self.__batch_index*self.batch_size
stop = start + self.batch_size
self.__batch_index += 1
if stop > len(self.x):
raise StopIteration
transformed = np.array(self.x[start:stop]) # loads from hdf5
for i in range(len(transformed)):
zoom = uniform(self.zoom_range[0], self.zoom_range[1])
transformations = {
'zx': zoom,
'zy': zoom,
'shear': uniform(-self.shear_range, self.shear_range),
'flip_horizontal': self.horizontal_flip and bool(randint(0,2))
}
transformed[i] = self.__img_gen.apply_transform(transformed[i], transformations)
import pdb;pdb.set_trace()
return transformed * self.rescale
And I call the generator with:
import h5py
import tables
in_hdf5_file = tables.open_file("gdrive/My Drive/Colab Notebooks/dataset.hdf5", mode='r')
images = in_hdf5_file.root.train_img
my_gen = CustomImagesGenerator(
images,
zoom_range=[0.8, 1],
batch_size=32,
shear_range=6,
rescale=1./255,
horizontal_flip=False
)
classifier.fit_generator(my_gen, steps_per_epoch=100, epochs=1, verbose=1)
The import of Sequence
resolved the "shape" error, but now I am getting the error:
Exception in thread Thread-5: Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/data_utils.py", line 742, in _run sequence = list(range(len(self.sequence))) TypeError: 'numpy.float64' object cannot be interpreted as an integer
How can I resolve this? I suspect it might be again a conflict in the keras packages and don't know how to tackle it.
Example usage with model.fit()
in your case:
from tensorflow.keras.utils import to_categorical
import tensorflow as tf
import tables
#define your model
...
#load your data from an hdf5 file
in_hdf5_file = tables.open_file("path/to/your/dataset.hdf5", mode='r')
x = in_hdf5_file.root.train_img[:]
y = in_hdf5_file.root.train_labels[:]
yourModel.fit(x, to_categorical(y, 3), epochs=2, batch_size=5)
For more info read my comments to your original post, or feel free to ask.
EDIT: I fixed your generator, now it only needs a path to your hdf5 file
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
from tensorflow.keras.utils import to_categorical
import numpy as np
from tensorflow.python.keras.utils.data_utils import Sequence
import tensorflow as tf
import tables
#define your model
...
#training
def h5data_generator(path, batch_size=1):
batch_index = 0
while 1:
with tables.open_file(path, mdoe='r') as f:
x = f.root.train_img[batch_index:batch_index + batch_size]
y = f.root.train_labels[batch_index:batch_index + batch_size]
if batch_index >= x.shape[0]:
batch_index = 0
batch_index += 1
yield (x, to_categorical(y, 3))
del x
del y
my_gen = h5data_generator("path/to/your/dataset.hdf5")
yourModel.fit_generator(my_gen, steps_per_epoch=100, epochs=20, verbose=1)
The problem with your generator was bad data output on step, it wasn't outputting (x, y)
, there was no way it could, it was outputting x
(image in your case), also since it was using Sequential
keras tried to interpret it as an object that used it's api(not the case in your generator). Also it doesn't have to be a class
, it needs to be a python generator , as indicated by an example in keras it self(doc string of fit_generator()
),
fit_generator.__doc__
:
Fits the model on data yielded batch-by-batch by a Python generator.
The generator is run in parallel to the model, for efficiency.
For instance, this allows you to do real-time data augmentation
on images on CPU in parallel to training your model on GPU.
The use of `keras.utils.Sequence` guarantees the ordering
and guarantees the single use of every input per epoch when
using `use_multiprocessing=True`.
Arguments:
generator: A generator or an instance of `Sequence`
(`keras.utils.Sequence`)
object in order to avoid duplicate data
when using multiprocessing.
The output of the generator must be either
- a tuple `(inputs, targets)`
- a tuple `(inputs, targets, sample_weights)`.
This tuple (a single output of the generator) makes a single batch.
Therefore, all arrays in this tuple must have the same length (equal
to the size of this batch). Different batches may have different
sizes.
For example, the last batch of the epoch is commonly smaller than
the
others, if the size of the dataset is not divisible by the batch
size.
The generator is expected to loop over its data
indefinitely. An epoch finishes when `steps_per_epoch`
batches have been seen by the model.
steps_per_epoch: Total number of steps (batches of samples)
to yield from `generator` before declaring one epoch
finished and starting the next epoch. It should typically
be equal to the number of samples of your dataset
divided by the batch size.
Optional for `Sequence`: if unspecified, will use
the `len(generator)` as a number of steps.
epochs: Integer, total number of iterations on the data.
verbose: Verbosity mode, 0, 1, or 2.
callbacks: List of callbacks to be called during training.
validation_data: This can be either
- a generator for the validation data
- a tuple (inputs, targets)
- a tuple (inputs, targets, sample_weights).
validation_steps: Only relevant if `validation_data`
is a generator. Total number of steps (batches of samples)
to yield from `generator` before stopping.
Optional for `Sequence`: if unspecified, will use
the `len(validation_data)` as a number of steps.
validation_freq: Only relevant if validation data is provided. Integer
or `collections.Container` instance (e.g. list, tuple, etc.). If an
integer, specifies how many training epochs to run before a new
validation run is performed, e.g. `validation_freq=2` runs
validation every 2 epochs. If a Container, specifies the epochs on
which to run validation, e.g. `validation_freq=[1, 2, 10]` runs
validation at the end of the 1st, 2nd, and 10th epochs.
class_weight: Dictionary mapping class indices to a weight
for the class.
max_queue_size: Integer. Maximum size for the generator queue.
If unspecified, `max_queue_size` will default to 10.
workers: Integer. Maximum number of processes to spin up
when using process-based threading.
If unspecified, `workers` will default to 1. If 0, will
execute the generator on the main thread.
use_multiprocessing: Boolean.
If `True`, use process-based threading.
If unspecified, `use_multiprocessing` will default to `False`.
Note that because this implementation relies on multiprocessing,
you should not pass non-picklable arguments to the generator
as they can't be passed easily to children processes.
shuffle: Boolean. Whether to shuffle the order of the batches at
the beginning of each epoch. Only used with instances
of `Sequence` (`keras.utils.Sequence`).
Has no effect when `steps_per_epoch` is not `None`.
initial_epoch: Epoch at which to start training
(useful for resuming a previous training run)
Returns:
A `History` object.
Example:
```python
def generate_arrays_from_file(path):
while 1:
f = open(path)
for line in f:
# create numpy arrays of input data
# and labels, from each line in the file
x1, x2, y = process_line(line)
yield ({'input_1': x1, 'input_2': x2}, {'output': y})
f.close()
model.fit_generator(generate_arrays_from_file('/my_file.txt'),
steps_per_epoch=10000, epochs=10)
```
Raises:
ValueError: In case the generator yields data in an invalid format.
For more info check out the github page of keras, fit_generator()
to be exact , or again feel free to ask.
EDIT 2: You can also pass batch_size
to the h5data_generator()
, which will set the batch size of data getting pulled from your dataset on single step.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.