Confusion regarding data generator in python for use in Keras fit_generator

Question

In some question and tutorials like below:

suggest that a data generator for keras should be a class having __iter__ and __next__ methods in it.

While some other tutorials like:

use the normal python function with a yield statement providing data. While I successfully used yield in an LSTM network following the second tutorial above, I am not able to use the normal yield function in a convolutional network and getting the below error in fit_generator:

'method' object is not an iterator

I haven't tried using the __next__ method, but whoever got the above error was advised to use the __next__ method ( EDIT: working after a fix suggested by Daniel Möller). Can someone please help me clarify which technique to use when and what is the difference between a function that " yields " the next sample vs a class with __iter__ & __next__ ?

My working code using yield: https://github.com/KashyapCKotak/Multidimensional-Stock-Price-Prediction/blob/master/StockTF1_4Sequential.ipynb

My current data generator function using yield ( EDIT: working after a fix suggested by Daniel Möller):

def train_images_generator(self):
    for epoch in range(0, self.epochs):
      print("Current Epoch:",epoch)
      cnt = 0
      if epoch > 2000:
        learning_rate = 1e-5

      for ind in np.random.permutation(len(self.train_ids)):
        print("provided image with id:",ind)
        #get the input image and target/ground truth image based on ind
        raw = rawpy.imread(in_path)
        input_images = np.expand_dims(pack_raw(raw), axis=0) * ratio # pack the bayer image in 4 channels of RGBG

        gt_raw = rawpy.imread(gt_path)
        im = gt_raw.postprocess(use_camera_wb=True,
                      half_size=False,
                      no_auto_bright=True, output_bps=16)
        gt_images = np.expand_dims(np.float32(im / 65535.0),axis=0) # divide by 65535 to normalise (scale between 0 and 1)

        # crop

        H = input_images.shape[1] # get the image height (number of rows)
        W = input_images.shape[2] # get the image width (number of columns)

        xx = np.random.randint(0, W - ps) # get a random number in W-ps (W-512)
        yy = np.random.randint(0, H - ps) # get a random number in H-ps (H-512)
        input_patch = input_images[:, yy:yy + ps, xx:xx + ps, :]
        gt_patch = gt_images[:, yy * 2:yy * 2 + ps * 2, xx * 2:xx * 2 + ps * 2, :]

        if np.random.randint(2) == 1:  # random flip for rows
          input_patch = np.flip(input_patch, axis=1)
          gt_patch = np.flip(gt_patch, axis=1)
        if np.random.randint(2) == 1:  # random flip for columns
          input_patch = np.flip(input_patch, axis=2)
          gt_patch = np.flip(gt_patch, axis=2)
        if np.random.randint(2) == 1:  # random transpose
          input_patch = np.transpose(input_patch, (0, 2, 1, 3))
          gt_patch = np.transpose(gt_patch, (0, 2, 1, 3))\

        input_patch = np.minimum(input_patch, 1.0)

        yield (input_patch,gt_patch)

How I use it:

model.fit_generator(
  generator=data.train_images_generator(),
  steps_per_epoch=steps_per_epoch,
  epochs=epochs,
  callbacks=callbacks,
  max_queue_size=50
  #workers=0

)

Answer 1

Looking carefully at the word 'method' , I see you are not "calling" your generator (you are not creating it).

You are passing just the function/method.

Suppose you have:

def generator(...):
    ...
    yield x, y

Instead of something like:

model.fit_generator(generator)

You should do something like:

model.fit_generator(generator(...))

Generator or Sequence

What is the difference between using a generator (a function with yield ) and a keras.utils.Sequence ?

When using a generator, training will follow the exact loop order, and it will not know when to finish. So.

With a generator:

Cannot shuffle batches because it will always follow the order of the loop
Must inform steps_per_epoch because Keras cannot know when the generator has finished (generators for Keras must be infinite)
If using multiprocessing, the system may not handle the batches correctly because it's impossible to know which process will start or finish before the others.

With a Sequence :

You control the length of the generator. Keras knows the number of batches automatically
You control the indexing of the batches, so Keras can shuffle batches.
You can take whatever batch you want how many times you want (you are not forced to take batches in sequence)
Multiprocessing can use the indices to make sure the batches are not going to be mixed in the end.

Confusion regarding data generator in python for use in Keras fit_generator

Question

1 answers

solution1
1 ACCPTED 2019-07-26 17:06:09

Generator or Sequence

Confusion regarding data generator in python for use in Keras fit_generator

Question

1 answers

solution1 1 ACCPTED 2019-07-26 17:06:09

Generator or Sequence

solution1
1 ACCPTED 2019-07-26 17:06:09