Combining CNN and bidirectional LSTM

Question

I am trying to combine CNN and LSTM for image classification.

I tried the following code and I am getting an error. I have 4 classes on which I want to train and test.

Following is the code:

from keras.models import Sequential
from keras.layers import LSTM,Conv2D,MaxPooling2D,Dense,Dropout,Input,Bidirectional,Softmax,TimeDistributed


input_shape = (200,300,3)
Model = Sequential()
Model.add(TimeDistributed(Conv2D(
            filters=16, kernel_size=(12, 16), activation='relu', input_shape=input_shape)))
Model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2),strides=2)))
Model.add(TimeDistributed(Conv2D(
            filters=24, kernel_size=(8, 12), activation='relu')))
Model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2),strides=2)))
Model.add(TimeDistributed(Conv2D(
            filters=32, kernel_size=(5, 7), activation='relu')))
Model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2),strides=2)))
Model.add(Bidirectional(LSTM((10),return_sequences=True)))
Model.add(Dense(64,activation='relu'))
Model.add(Dropout(0.5))
Model.add(Softmax(4))
Model.compile(loss='sparse_categorical_crossentropy',optimizer='adam')
Model.build(input_shape)

I am getting the following error:

"Input tensor must be of rank 3, 4 or 5 but was {}.".format(n + 2)) ValueError: Input tensor must be of rank 3, 4 or 5 but was 2.

Answer 1

I found a lot of problems in the code:

your data are in 4D so simple Conv2D are ok, TimeDistributed is not needed
your output is 2D so set return_sequences=False in the last LSTM cell
your last layers are very messy: no need to put a dropout between a layer output and an activation
you need categorical_crossentropy and not sparse_categorical_crossentropy because your target is one-hot encoded
LSTM expects 3D data. So you need to pass from 4D (the output of convolutions) to 3D. There are two possibilities you can adopt: 1) make a reshape (batch_size, H, W * channel); 2) (batch_size, W, H * channel). In this way, u have 3D data to use inside your LSTM

here a full model example:

def ReshapeLayer(x):
    
    shape = x.shape
    
    # 1 possibility: H,W*channel
    reshape = Reshape((shape[1],shape[2]*shape[3]))(x)
    
    # 2 possibility: W,H*channel
    # transpose = Permute((2,1,3))(x)
    # reshape = Reshape((shape[1],shape[2]*shape[3]))(transpose)
    
    return reshape

model = Sequential()
model.add(Conv2D(filters=16, kernel_size=(12, 16), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Conv2D(filters=24, kernel_size=(8, 12), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Conv2D(filters=32, kernel_size=(5, 7), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2),strides=2))
model.add(Lambda(ReshapeLayer)) # <========== pass from 4D to 3D
model.add(Bidirectional(LSTM(10, activation='relu', return_sequences=False)))
model.add(Dense(nclasses,activation='softmax'))

model.compile(loss='categorical_crossentropy',optimizer='adam')
model.summary()

here the running notebook

Combining CNN and bidirectional LSTM

Question

1 answers

solution1
3 ACCPTED 2020-10-01 10:01:52

Combining CNN and bidirectional LSTM

Question

1 answers

solution1 3 ACCPTED 2020-10-01 10:01:52

solution1
3 ACCPTED 2020-10-01 10:01:52