I am training a CNN model on KTH dataset to detect 6 classes of human actions.
This is my model architecture .
And this is the code of the NN layers.
model = Sequential()
model.add(Conv3D(filters=64,
kernel_size=(3, 3, 3),
strides=(1, 1, 1),
padding='valid',
activation='relu',
input_shape=X_train.shape[1:]))
model.add(MaxPooling3D(pool_size=2,
strides=(2, 2, 2),
padding='same'))
model.add(Conv3D(filters=128,
kernel_size=(3, 3, 3),
strides=(1, 1, 1),
padding='valid',
activation='relu'))
model.add(MaxPooling3D(pool_size=2,
strides=(2, 2, 2),
padding='same'))
model.add(Conv3D(filters=256,
kernel_size=(3, 3, 3),
strides=(1, 1, 1),
padding='valid',
activation='relu'))
model.add(Conv3D(filters=256,
kernel_size=(3, 3, 3),
strides=(1, 1, 1),
padding='valid',
activation='relu'))
model.add(MaxPooling3D(pool_size=2,
strides=(2, 2, 2),
padding='same'))
model.add(Conv3D(filters=512,
kernel_size=(3, 3, 3),
strides=(1, 1, 1),
padding='valid',
activation='relu'))
model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))
#model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(6, activation='softmax'))
model.summary()
Training
My problem is both training and validation accuracy do not change, and they basically froze from the first epoch. These are the training step. These are the first 6 epochs and here the last 6 epochs . The Loss looks like this . Training loss is very high, and the loss for validation doesn't change. and the training looks like this .
I am confused, is the model underfitting or overfitting ? How I am gonna fix this problem? will dropout help, since I can't do data augmentation on videos (I assumed that)?
I greatly appreciate any suggestion.
It depends on how you use the 200frames of video as training data to classify an action. Your training data is having too much bias. Since its a sequential data to be classified, you have to go for memory based architecture or concatenation model.
You are using 0-1 values of frames and are using relu. In dying relu problem model is frozen and doesn't learn at all because relu gets maximum values b/w 0 or the weight*input if bias is not added. You can do 2 things to ensure that model does work properly altough I am not sure whether you will get good accuracy or not but can try this to avoid this dying relu problem:-
Use leaky relu with alpha>=0.2. Do not normalize the frames, instead just convert to grayscale to reduce extensive training. Don't take 200 frames from middle, divide all videos in equal amount of frame chunks and take 2,3 consecutive frames from each chunk. also try adding more dense layers as they help in classification.
I worked on almost same problem and what I did was to use Conv2d after merging frames together ie if you have 10 frames of size 64,64,3 each instead of doing conv3d, I did conv2d on 640,64,3 dataset and resulted in 86% accuracy on 16 classes for videos.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.