简体   繁体   中英

TensorFlow Keras MaxPool2D breaks LSTM with CTC loss?

I am trying to tie together a CNN layer with 2 LSTM layers and ctc_batch_cost for loss, but I'm encountering some problems. My model is supposed to work with grayscale images.

During my debugging I've figured out that if I use just a CNN layer that keeps the output size equal to the input size + LSTM and CTC, the model is able to train:

# === Without MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))

cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)

# Go from Bx128x32x1 to Bx128x32 (B x TimeSteps x Features)
rnn_inp = Reshape((128, 32))(maxp)

blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)

# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)

# Model compiles, calling fit works!

But when I add a MaxPool2D layer that halves the dimensions, I get an error sequence_length(0) <= 64 , similar to the one presented here .

# === With MaxPool2D ===
inp = Input(name='inp', shape=(128, 32, 1))

cnn = Conv2D(name='conv', filters=1, kernel_size=3, strides=1, padding='same')(inp)
maxp = MaxPool2D(name='maxp', pool_size=2, strides=2, padding='valid')(cnn) # -> 64x16x1

# Go from Bx64x16x1 to Bx64x16 (B x TimeSteps x Features)
rnn_inp = Reshape((64, 16))(maxp)

blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm1')(rnn_inp)
blstm = Bidirectional(LSTM(256, return_sequences=True), name='blstm2')(blstm)

# Softmax.
dense = TimeDistributed(Dense(80, name='dense'), name='timedDense')(blstm)
rnn_outp = Activation('softmax', name='softmax')(dense)

# Model compiles, but calling fit crashes with:
# InvalidArgumentError: sequence_length(0) <= 64
#    [[{{node ctc_loss_1/CTCLoss}}]]

After struggling for about 3 days with this problem, I posted the above question here, on StackOverflow. About 2 hours after posting the questions I finally figured it out.

TL;DR Solution:

If you're using ctc_batch_cost :

Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the input_length argument.

If you're using ctc_loss :

Make sure you're passing the lengths (numbers of timesteps) of the sequences entering your RNNs as their inputs for the logit_length argument.

Solution:

The solution lies in the documentation, which, relatively sparse, can be cryptic for a machine learning newbie like myself.

The TensorFlow documentation for ctc_batch_cost reads:

tf.keras.backend.ctc_batch_cost(
    y_true, y_pred, input_length, label_length
)

...

input_length tensor (samples, 1) containing the sequence length for each batch item in y_pred.

...

input_length corresponds to logit_length from ctc_loss function's TensorFlow documentation :

tf.nn.ctc_loss(
    labels, logits, label_length, logit_length, logits_time_major=True, unique=None,
    blank_index=None, name=None
)

...

logit_length tensor of shape [batch_size] Length of input sequence in logits.

...

That's where it clicked, at the word logit . So, the argument for input_length or logit_length is supposed to be a tensor/container (in my case, numpy array) of the lengths (ie number of timesteps) of the sequences entering the RNN (in my case LSTM) as input.

I was originally making the mistake of considering the required length to be the width of the grayscale images that act as input for the whole network (CNN + MaxPool2D + RNN), but because the MaxPool2D layer creates a tensor of different dimensions for the RNN's input, the ctc loss function crashes.

Now fit runs without crashing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM