Connecting CNN to RNN

Question

I want to train a neural network to classify simple videos. My approach is to use a CNN whose output is connected to an RNN (LSTM). I'm having some trouble trying to connect the two together.

X_train.shape
(2400, 256, 256, 3)

Y_train.shape
(2400, 6)

Here is the network I defined

model = Sequential()
model.add(Conv2D(32 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu' , input_shape = (256,256,3)))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Conv2D(64 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Conv2D(128 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Conv2D(256 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Flatten())

model.add(layers.LSTM(64, return_sequences=True, input_shape=(1,256)))

model.add(layers.LSTM(32, return_sequences=True))

model.add(layers.LSTM(32))

model.add(layers.Dense(6, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

I get the following error

ValueError: Input 0 of layer lstm_7 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 65536]

I have a feeling it has something to do with the input shape of the RNN. The aim is to have the the CNN picks up on features of frames and then RNN pick up on high level differences between frames. Would it be better to do this with two entirely different networks? If so how can I achieve that? and also is there a way to train the two networks with batches of data since it is quite large.

Answer 1

You are quite right. In tensorflow LSTM expects an input of the shape (batch_size, time_steps, embedding_size) , seeexample for more details. In your case, try using model.add(Reshape((16, 16*256))) instead of model.add(Flatten()) . Not the most beautiful solution, but it will allow you to test things.

Answer 2

the problem is the data passed to LSTM and it can be solved inside your network. It expects 3D and with Flatten you are destroying it. there are two possibilities you can adopt: 1) make a reshape (batch_size, H, W*channel) ; 2) (batch_size, W, H*channel) . In this way u have 3D data to use inside your LSTM. below an example

model = Sequential()
model.add(Conv2D(32 , (3,3) , strides = 1 , padding = 'same' , 
                 activation = 'relu' , input_shape = (256,256,3)))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Conv2D(64 , (3,3) , strides = 1 , padding = 'same' , 
                 activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Conv2D(128 , (3,3) , strides = 1 , padding = 'same' , 
                 activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Conv2D(256 , (3,3) , strides = 1 , padding = 'same' , 
                 activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

def ReshapeLayer(x):
    
    shape = x.shape
    
    # 1 possibility: H,W*channel
    reshape = Reshape((shape[1],shape[2]*shape[3]))(x)
    
    # 2 possibility: W,H*channel
    # transpose = Permute((2,1,3))(x)
    # reshape = Reshape((shape[1],shape[2]*shape[3]))(transpose)
    
    return reshape

model.add(Lambda(ReshapeLayer))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(32))

model.add(Dense(6, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', 
              metrics=['accuracy'])
model.summary()

Connecting CNN to RNN

Question

2 answers

solution1
1 2020-07-16 13:16:25

solution2
1 ACCPTED 2020-07-16 13:22:27

Connecting CNN to RNN

Question

2 answers

solution1 1 2020-07-16 13:16:25

solution2 1 ACCPTED 2020-07-16 13:22:27

solution1
1 2020-07-16 13:16:25

solution2
1 ACCPTED 2020-07-16 13:22:27