简体   繁体   中英

Accuracy in a CNN model never goes high for training and validation set

I am training a CNN model on KTH dataset to detect 6 classes of human actions.

Data Processing

  • Dataset consists of 599 videos, each action has 99-100 videos performed by 25 different persons. I divided the data to 300 videos for train, 98 videos for validation and 200 videos for test set.
  • I reduced the resolution to 50x50 pixels, so I don't run out of memory while processing.
  • I exracted 200 frames from the middle of each video.
  • it normalized the pixels from 0-255 to 0,1 .
  • Finally I one hot encoded to class labels.

Model architecture

This is my model architecture .
And this is the code of the NN layers.

model = Sequential()
model.add(Conv3D(filters=64,
         kernel_size=(3, 3, 3),
         strides=(1, 1, 1),
         padding='valid',
         activation='relu', 
         input_shape=X_train.shape[1:]))

model.add(MaxPooling3D(pool_size=2,
               strides=(2, 2, 2),
               padding='same'))

model.add(Conv3D(filters=128,
         kernel_size=(3, 3, 3),
         strides=(1, 1, 1),
         padding='valid',
         activation='relu'))

model.add(MaxPooling3D(pool_size=2,
               strides=(2, 2, 2),
               padding='same'))

model.add(Conv3D(filters=256,
         kernel_size=(3, 3, 3),
         strides=(1, 1, 1),
         padding='valid', 
         activation='relu'))

model.add(Conv3D(filters=256,
         kernel_size=(3, 3, 3),
         strides=(1, 1, 1),
         padding='valid',
         activation='relu'))

model.add(MaxPooling3D(pool_size=2,
               strides=(2, 2, 2),
               padding='same'))

model.add(Conv3D(filters=512,
         kernel_size=(3, 3, 3),
         strides=(1, 1, 1),
         padding='valid',
         activation='relu'))

model.add(Dense(4096, activation='relu'))
model.add(Dense(4096, activation='relu'))
#model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(6, activation='softmax'))

model.summary()

Training

My problem is both training and validation accuracy do not change, and they basically froze from the first epoch. These are the training step. These are the first 6 epochs and here the last 6 epochs . The Loss looks like this . Training loss is very high, and the loss for validation doesn't change. and the training looks like this .

I am confused, is the model underfitting or overfitting ? How I am gonna fix this problem? will dropout help, since I can't do data augmentation on videos (I assumed that)?

I greatly appreciate any suggestion.

It depends on how you use the 200frames of video as training data to classify an action. Your training data is having too much bias. Since its a sequential data to be classified, you have to go for memory based architecture or concatenation model.

You are using 0-1 values of frames and are using relu. In dying relu problem model is frozen and doesn't learn at all because relu gets maximum values b/w 0 or the weight*input if bias is not added. You can do 2 things to ensure that model does work properly altough I am not sure whether you will get good accuracy or not but can try this to avoid this dying relu problem:-

Use leaky relu with alpha>=0.2. Do not normalize the frames, instead just convert to grayscale to reduce extensive training. Don't take 200 frames from middle, divide all videos in equal amount of frame chunks and take 2,3 consecutive frames from each chunk. also try adding more dense layers as they help in classification.

I worked on almost same problem and what I did was to use Conv2d after merging frames together ie if you have 10 frames of size 64,64,3 each instead of doing conv3d, I did conv2d on 640,64,3 dataset and resulted in 86% accuracy on 16 classes for videos.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM