I'm working on a classifier for video sequences. It should take several video frames on input and output a label, either 0 or 1. So, it is a many-to-one network.
I already have a classifier for single frames. This classifier makes several convolutions with Conv2D
, then applies GlobalAveragePooling2D
. This results in 1D vector of length 64. Then original per-frame classifier has a Dence
layer with softmax activation.
Now I would like to extend this classifier to work with sequences. Ideally, sequences should be of varying length, but for now I fix the length to 4.
To extend my classifier, I'm going to replace Dense
with an LSTM layer with 1 unit. So, my goal is to have the LSTM layer to take several 1D vectors of length 64, one by one, and output a label.
Schematically, what I have now:
input(99, 99, 3) - [convolutions] - features(1, 64) - [Dense] - [softmax] - label(1, 2)
Desired architecture:
4x { input(99, 99, 3) - [convolutions] - features(1, 64) } - [LSTM] - label(1, 2)
I cannot figure out, how to do it with Keras.
Here is my code for convolutions
from keras.layers import Conv2D, BatchNormalization, GlobalAveragePooling2D, \
LSTM, TimeDistributed
IMAGE_WIDTH=99
IMAGE_HEIGHT=99
IMAGE_CHANNELS=3
convolutional_layers = Sequential([
Conv2D(input_shape=(IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS),
filters=6, kernel_size=(3, 3), strides=(2, 2), activation='relu',
name='conv1'),
BatchNormalization(),
Conv2D(filters=64, kernel_size=(1, 1), strides=(1, 1), activation='relu',
name='conv5_pixel'),
BatchNormalization(),
GlobalAveragePooling2D(name='avg_pool6'),
])
Here is the summary:
In [24]: convolutional_layers.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1 (Conv2D) (None, 49, 49, 6) 168
_________________________________________________________________
batch_normalization_3 (Batch (None, 49, 49, 6) 24
_________________________________________________________________
conv5_pixel (Conv2D) (None, 49, 49, 64) 448
_________________________________________________________________
batch_normalization_4 (Batch (None, 49, 49, 64) 256
_________________________________________________________________
avg_pool6 (GlobalAveragePool (None, 64) 0
=================================================================
Total params: 896
Trainable params: 756
Non-trainable params: 140
Now I want a recurrent layer to process sequences of these 64-dimensional vectors and output a label for each sequence.
I've read in manuals that TimeDistributed
layer applies its input layer to every time slice of the input data.
I continue my code:
FRAME_NUMBER = 4
td = TimeDistributed(convolutional_layers, input_shape=(FRAME_NUMBER, 64))
model = Sequential([
td,
LSTM(units=1)
])
Result is the exception IndexError: list index out of range
Same exception for
td = TimeDistributed(convolutional_layers, input_shape=(None, FRAME_NUMBER, 64))
What am I doing wrong?
Expanding on the comments to an answer; the TimeDistributed layer applies the given layer to every time step of the input. Hence, your TimeDistributed would apply to every frame giving an input shape=(F_NUM, W, H, C)
. After applying the convolution to every image, you get back (F_NUM, 64)
which are features for every frame.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.