简体   繁体   中英

Understanding Keras LSTM model with varying size of batches

I have a working LSTM model with Keras, but I need more control over things so I'm transforming to tensorflow (1.13).

In the way of doing so, first thing I encountered was the batch size handling.

This is the Keras' simple model:

model1 = Sequential()
model1.add(LSTM(64, input_shape=(seq_length, X_train.shape[2]),return_sequences=False))
model1.add(Dense(y_train.shape[2], activation='softmax'))
model1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# y_train[:,-1,:] takes only the fifth timestap's y of each sample
model1.fit(X_train, y_train[:,-1,:], epochs=300, batch_size=512)

I'm using batch_size of 512 for example, while the number of samples in X_train is 108765, a number that can't be divided (fully) by 512, which means that the last epoch's last step will be 221 instead of 512. Keras handles this in the back and I didn't need to do anything for it. When using TensorFlow, I need to specify a batch size for the initial state - which will be used throughout the epoch as the output state of tf.nn.dynamic_rnn .

So:

  1. How does Keras handle this?

  2. What can be done in TensorFlow order to overcome this, without losing data to fit the batch size?

It is a bit complicated to go over everything, but let's start with how can you do this with Tensorflow. So, firstly same constants:

# Input data
X = np.random.rand(108765, 10, 3).astype(np.float32)
# The number of epochs
epochs = 10
# The size of the batch, in your case 512
batch_size = 512
# Size of the cell, 64 as per your code
cell_size = 64

Now, the data loading logic. We can create a tf.data.Dataset() , that will take care of the loading of the data automatically:

# A dataset from a tensor
dataset = tf.data.Dataset.from_tensor_slices(X)
# Shuffle the dataset with some arbitrary buffer size
dataset = dataset.shuffle(buffer_size=10)
# Divide the dataset into batches. Once you reach the last batch which won't be 512, the dataset will know exactly which elements remain and should be passed as a batch.
dataset = dataset.batch(batch_size)
# An iterator that can be reinitialized over and over again, therefore having a new shuffle of the data each time
iterator = dataset.make_initializable_iterator()
# A node that can be run to obtain the next element in the dataset. However, this node will be linked in the model so obtaining the next element will be done automatically
data_X = iterator.get_next()

The final part of the model is the tf.nn.dynamic_rnn itself:

cell = tf.nn.rnn_cell.LSTMCell(cell_size)
current_batch_size = tf.shape(data_X)[0]
init_state = cell.zero_state(current_batch_size, tf.float32)
outputs, states = tf.nn.dynamic_rnn(cell=cell, inputs=data_X, initial_state=init_state)

Now, we are set to create the training logic:

# Creation of a session
with tf.Session() as sess:
    # Initialization of all variables in the TF graph
    sess.run(tf.global_variables_initializer())
    # Executing the block below epoch times
    for e in range(epochs):
        # Each time, reinitialize the iterator to obtain a fresh shuffle of the training data
        sess.run(iterator.initializer)
        try:
            # As long as there are elements execute the block below
            while True:
               # The whole training logic
        except tf.errors.OutOfRangeError:
            pass

I assume that this code should help you to assemble your own logic to create a TF model and train it. As for how keras does things in the background, I don't know exactly. Similarly as TF, let's assume that it knows which elements have passed, and which elements remain in the data loading module.

Finally, I want to point out that all these things are complicated for themselves and you should do a bit of reading on your own. In particular, the purpose of the answer is to help you understand how can you do the data loading without loosing any information. Good luck!

There are two types of RNN in Keras:

  • stateful=True
  • stateful=False (your case according to the code you posted)

The difference between them is that the True version will keep the states in memory between batches (to simulate that the second batch is a sequel of the first batch, for instance), while the False version will create a new state matrix for every batch (so every batch contains full sequences, not parts of sequences).

So, in the True case, Keras does face the same problem as you do: it needs a fixed batch size, and in fact it demands that you specify the batch size when you use stateful=True .

But, if you're using stateful=False , it will just create a new states matrix full of zeros.

So, basically:

  • If you want to create a Tensorflow stateful=True layer, you need the batch size to be constant, just as Keras also does
  • If you want to create a Tensorflow stateful=False layer, you can just create new states as an all-zeros matrix with shape (samples, output_dim)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM