简体   繁体   English

将 CNN 连接到 RNN

[英]Connecting CNN to RNN

I want to train a neural network to classify simple videos.我想训练一个神经网络来分类简单的视频。 My approach is to use a CNN whose output is connected to an RNN (LSTM).我的方法是使用 CNN,其 output 连接到 RNN (LSTM)。 I'm having some trouble trying to connect the two together.我在尝试将两者连接在一起时遇到了一些麻烦。

X_train.shape
(2400, 256, 256, 3)

Y_train.shape
(2400, 6)

Here is the network I defined这是我定义的网络

model = Sequential()
model.add(Conv2D(32 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu' , input_shape = (256,256,3)))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Conv2D(64 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Conv2D(128 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Conv2D(256 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Flatten())

model.add(layers.LSTM(64, return_sequences=True, input_shape=(1,256)))

model.add(layers.LSTM(32, return_sequences=True))

model.add(layers.LSTM(32))

model.add(layers.Dense(6, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

I get the following error我收到以下错误

ValueError: Input 0 of layer lstm_7 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 65536]

I have a feeling it has something to do with the input shape of the RNN.我感觉它与 RNN 的输入形状有关。 The aim is to have the the CNN picks up on features of frames and then RNN pick up on high level differences between frames.目的是让 CNN 获取帧的特征,然后 RNN 获取帧之间的高级差异。 Would it be better to do this with two entirely different networks?用两个完全不同的网络来做这件事会更好吗? If so how can I achieve that?如果是这样,我该如何实现? and also is there a way to train the two networks with batches of data since it is quite large.还有一种方法可以用批量数据训练这两个网络,因为它非常大。

You are quite right.你说的很对。 In tensorflow LSTM expects an input of the shape (batch_size, time_steps, embedding_size) , seeexample for more details.在 tensorflow LSTM 中需要一个形状为(batch_size, time_steps, embedding_size)的输入,有关更多详细信息,请参见示例 In your case, try using model.add(Reshape((16, 16*256))) instead of model.add(Flatten()) .在您的情况下,请尝试使用model.add(Reshape((16, 16*256)))而不是model.add(Flatten()) Not the most beautiful solution, but it will allow you to test things.不是最漂亮的解决方案,但它可以让您测试事物。

the problem is the data passed to LSTM and it can be solved inside your network.问题是传递给 LSTM 的数据,它可以在您的网络内部解决。 It expects 3D and with Flatten you are destroying it.它期望 3D 并且使用 Flatten 您正在摧毁它。 there are two possibilities you can adopt: 1) make a reshape (batch_size, H, W*channel) ;您可以采用两种可能性:1)进行重塑(batch_size, H, W*channel) 2) (batch_size, W, H*channel) . 2) (batch_size, W, H*channel) In this way u have 3D data to use inside your LSTM.通过这种方式,您可以在 LSTM 中使用 3D 数据。 below an example下面是一个例子

model = Sequential()
model.add(Conv2D(32 , (3,3) , strides = 1 , padding = 'same' , 
                 activation = 'relu' , input_shape = (256,256,3)))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Conv2D(64 , (3,3) , strides = 1 , padding = 'same' , 
                 activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Conv2D(128 , (3,3) , strides = 1 , padding = 'same' , 
                 activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

model.add(Conv2D(256 , (3,3) , strides = 1 , padding = 'same' , 
                 activation = 'relu'))
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))

def ReshapeLayer(x):
    
    shape = x.shape
    
    # 1 possibility: H,W*channel
    reshape = Reshape((shape[1],shape[2]*shape[3]))(x)
    
    # 2 possibility: W,H*channel
    # transpose = Permute((2,1,3))(x)
    # reshape = Reshape((shape[1],shape[2]*shape[3]))(transpose)
    
    return reshape

model.add(Lambda(ReshapeLayer))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(32, return_sequences=True))
model.add(LSTM(32))

model.add(Dense(6, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', 
              metrics=['accuracy'])
model.summary()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM