简体   繁体   English

堆叠卷积网络和递归层

[英]Stacking convolutional network and recurrent layer

I'm working on a classifier for video sequences. 我正在为视频序列分类。 It should take several video frames on input and output a label, either 0 or 1. So, it is a many-to-one network. 它应该在输入和输出标签上取几个视频帧,标签为0或1。因此,这是一个多对一网络。

I already have a classifier for single frames. 我已经有一个单帧分类器。 This classifier makes several convolutions with Conv2D , then applies GlobalAveragePooling2D . 该分类器使用Conv2D进行了多次卷积,然后应用GlobalAveragePooling2D This results in 1D vector of length 64. Then original per-frame classifier has a Dence layer with softmax activation. 这将产生长度为64的一维矢量。然后,原始的每帧分类器具有带有softmax激活的Dence层。

Now I would like to extend this classifier to work with sequences. 现在,我想扩展此分类器以使用序列。 Ideally, sequences should be of varying length, but for now I fix the length to 4. 理想情况下,序列的长度应该不同,但现在我将长度固定为4。

To extend my classifier, I'm going to replace Dense with an LSTM layer with 1 unit. 为了扩展我的分类器,我将用1个单位的LSTM层替换Dense So, my goal is to have the LSTM layer to take several 1D vectors of length 64, one by one, and output a label. 因此,我的目标是让LSTM层一个接一个地读取长度为64的几个一维向量,并输出一个标签。

Schematically, what I have now: 从原理上讲,我现在所拥有的:

input(99, 99, 3) - [convolutions] - features(1, 64) - [Dense] - [softmax] - label(1, 2)

Desired architecture: 所需架构:

4x { input(99, 99, 3) - [convolutions] - features(1, 64) } - [LSTM] - label(1, 2)

I cannot figure out, how to do it with Keras. 我不知道如何用Keras做到这一点。

Here is my code for convolutions 这是我的卷积代码

from keras.layers import Conv2D, BatchNormalization, GlobalAveragePooling2D, \
LSTM, TimeDistributed

IMAGE_WIDTH=99
IMAGE_HEIGHT=99
IMAGE_CHANNELS=3

convolutional_layers = Sequential([
    Conv2D(input_shape=(IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS),
           filters=6, kernel_size=(3, 3), strides=(2, 2), activation='relu',
           name='conv1'),
    BatchNormalization(),
    Conv2D(filters=64, kernel_size=(1, 1), strides=(1, 1), activation='relu',
           name='conv5_pixel'),
    BatchNormalization(),
    GlobalAveragePooling2D(name='avg_pool6'),
])

Here is the summary: 这是摘要:

In [24]: convolutional_layers.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv1 (Conv2D)               (None, 49, 49, 6)         168
_________________________________________________________________
batch_normalization_3 (Batch (None, 49, 49, 6)         24
_________________________________________________________________
conv5_pixel (Conv2D)         (None, 49, 49, 64)        448
_________________________________________________________________
batch_normalization_4 (Batch (None, 49, 49, 64)        256
_________________________________________________________________
avg_pool6 (GlobalAveragePool (None, 64)                0
=================================================================
Total params: 896
Trainable params: 756
Non-trainable params: 140

Now I want a recurrent layer to process sequences of these 64-dimensional vectors and output a label for each sequence. 现在,我需要一个循环层来处理这些64维向量的序列,并为每个序列输出一个标签。

I've read in manuals that TimeDistributed layer applies its input layer to every time slice of the input data. 我已经阅读了手册, TimeDistributed层将其输入层应用于输入数据的每个时间片。

I continue my code: 我继续我的代码:

FRAME_NUMBER = 4

td = TimeDistributed(convolutional_layers, input_shape=(FRAME_NUMBER, 64))
model = Sequential([
    td,
    LSTM(units=1)
])

Result is the exception IndexError: list index out of range 结果是异常IndexError: list index out of range

Same exception for 同样的例外

td = TimeDistributed(convolutional_layers, input_shape=(None, FRAME_NUMBER, 64))

What am I doing wrong? 我究竟做错了什么?

Expanding on the comments to an answer; 将评论扩展为答案; the TimeDistributed layer applies the given layer to every time step of the input. TimeDistributed层将给定层应用于输入的每个时间步 Hence, your TimeDistributed would apply to every frame giving an input shape=(F_NUM, W, H, C) . 因此,您的TimeDistributed将应用于提供输入shape=(F_NUM, W, H, C)每一帧。 After applying the convolution to every image, you get back (F_NUM, 64) which are features for every frame. 将卷积应用于每张图像后,您将获得(F_NUM, 64) ,这是每帧的特征。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM