如何使用固定的 kernel 大小和固定数量的过滤器来修复 CNN 层尺寸错误？

Question

I am currently trying to recreate a CNN model used in the paper我目前正在尝试重新创建论文中使用的 CNN model

"Using CNN for facial expression recognition: a study of the effects of kernel size and numbers of filters on accuracy" by Abhinav Agrawal and Namita Mittal ( https://doi.org/10.1007/s00371-019-01630-9 ). “使用 CNN 进行面部表情识别：kernel 大小和过滤器数量对准确性的影响的研究”，作者 Abhinav Agrawal 和 Namita Mittal ( https://doi.org/10.1007/s00371-01-9 )-.

It's unique in that it uses fixed kernel sizes and fixed number of filters and without dropout layers or a fully connected layer.它的独特之处在于它使用固定的 kernel 尺寸和固定数量的过滤器，并且没有丢失层或全连接层。 The proposed model from the paper is as follows:论文中提出的model如下：

Input data (64 × 64) grayscale image
Data augmentation
CONV 8 × 8 × 32,BATCH NORM
CONV 8 × 8 × 32, RELU, STRIDE (2 × 2)
CONV 8 × 8 × 32, BATCH NORM
CONV 8 × 8 × 32, RELU, STRIDE (2 × 2)
CONV 8 × 8 × 32, BATCH NORM
CONV 8 × 8 × 32, RELU, STRIDE (2 × 2)
CONV 8 × 8 × 32, BATCH NORM
CONV 8 × 8 × 32, RELU, STRIDE (2 × 2)
CONV 8 × 8 × 32, BATCH NORM
CONV 8 × 8 × 32, RELU, STRIDE (2 × 2)
CONV 8 × 8 × 32, BATCH NORM
CONV 8 × 8 × 32, RELU, STRIDE (2 × 2)
CONV 8 × 8 × 32, BATCH NORM
CONV 8 × 8 × 32, RELU, STRIDE (2 × 2)
CONV 8 × 8 × 32, BATCH NORM
CONV 7 × 7 × 7, RELU, STRIDE (1 × 1)
SOFTMAX

My code is as follows and tries to faithfully recreate the model from the paper, using the same dataset and images.我的代码如下，并尝试使用相同的数据集和图像忠实地从论文中重新创建 model。

import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import datasets, layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D, BatchNormalization
from tensorflow.keras.models import Sequential
#from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt

def plot_model_acc(history):
    #model history for accuracy
    plt.plot(history.history['accuracy'], label='accuracy')
    plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
    plt.title('Model Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend(['train','test'], loc='upper left')
    plt.savefig('model_acc.png', bbox_inches='tight')
    plt.close()

def plot_model_loss(history):
    #model history for loss
    plt.plot(history.history['loss'], label='loss')
    plt.plot(history.history['val_loss'], label='val_loss')
    plt.title('Model Loss')
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.savefig('model_loss.png', bbox_inches='tight')
    plt.close()



#model parameters
train_directory = 'train'
test_directory = 'test'
val_directory = 'validation'
num_train = 28709
num_test = 7178
num_val = 7178
batch_size = 64
num_epoch = 10


#creating datagen class and rescaling pixel values from  0-255 to 0-1 for grayscale
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
val_datagen = ImageDataGenerator(rescale=1./255)

#angry 0, disgusted 1, fearful 2, happy 3, neutral 4, sad 5, surprised 6

train_iterator = train_datagen.flow_from_directory(
    train_directory,
    target_size=(64, 64),
    class_mode='categorical',
    batch_size=batch_size,
    color_mode='grayscale')

test_iterator = test_datagen.flow_from_directory(
    test_directory,
    target_size=(64, 64),
    class_mode='categorical',
    batch_size=batch_size,
    color_mode='grayscale')

val_iterator = val_datagen.flow_from_directory(
    val_directory,
    target_size=(64, 64),
    class_mode='categorical',
    batch_size=batch_size,
    color_mode='grayscale')
#colormode grayscale since rescale=1./255
#class_mode is categorical, returns 2D one-hot encoded label

#model creation
#output layer size after conv layer is 
#[(W−K+2P)/S]+1.
#W is the input volume - in your case 64 (for 64x64x1 image)
#K is the Kernel size - in your case 8
#P is the padding - in your case 0
#S is the stride - which you have not provided.


model = Sequential()

model.add(Conv2D(32, kernel_size=(8, 8), input_shape=(64,64,1)))
model.add(BatchNormalization())
model.add(Conv2D(32, kernel_size=(8, 8), activation='relu', strides=(2,2)))
model.add(Conv2D(32, kernel_size=(8, 8)))
model.add(BatchNormalization())

model.summary()

model.add(Conv2D(32, kernel_size=(8, 8), activation='relu', strides=(2,2)))
model.add(Conv2D(32, kernel_size=(8, 8)))
model.add(BatchNormalization())
model.add(Conv2D(32, kernel_size=(8, 8), activation='relu', strides=(2,2)))
model.add(Conv2D(32, kernel_size=(8, 8)))
model.add(BatchNormalization())

model.summary()

model.add(Conv2D(32, kernel_size=(8, 8), activation='relu', strides=(2,2)))
model.add(Conv2D(32, kernel_size=(8, 8)))
model.add(BatchNormalization())
model.add(Conv2D(32, kernel_size=(8, 8), activation='relu', strides=(2,2)))
model.add(Conv2D(32, kernel_size=(8, 8)))
model.add(BatchNormalization())

model.summary()

model.add(Conv2D(32, kernel_size=(8, 8), activation='relu', strides=(2,2)))
model.add(Conv2D(32, kernel_size=(8, 8)))
model.add(BatchNormalization())
model.add(Conv2D(32, kernel_size=(8, 8), activation='relu', strides=(2,2)))
model.add(Conv2D(32, kernel_size=(8, 8)))
model.add(BatchNormalization())

model.summary()

model.add(Conv2D(7, kernel_size=(7, 7), strides=(1,1)))
model.add(layers.Activation('softmax'))

model.summary()


#optimizer definition, learning rate, lower takes more time, but high values may cause unstable training
adam_optimizer = keras.optimizers.Adam(learning_rate=0.0001)

#CategoricalCrossentropy for one-hot encoded labels and dense layer uses softmax activation, otherwise use other loss functions
model.compile(optimizer=adam_optimizer,
              loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

history = model.fit(
    train_iterator, 
    steps_per_epoch=num_train // batch_size, 
    epochs=num_epoch,
    validation_data=test_iterator,
    validation_steps=num_val // batch_size
    )

print(history.history.keys())
plot_model_acc(history)
plot_model_loss(history)
model.save_weights('model.h5')

I was running into the error:我遇到了错误：

"tensorflow.python.framework.errors_impl.InvalidArgumentError: Negative dimension size caused by subtracting 8 from 6 for 'conv2d_4/Conv2D' with input shapes: [?, 6, 6, 32], [8, 8, 32, 32]". “tensorflow.python.framework.errors_impl.InvalidArgumentError：负维度大小是由 'conv2d_4/Conv2D' 的 6 减去 8 引起的，输入形状为：[?, 6, 6, 32], [8, 8, 32, 32]” .

I realize that the output shape from the consecutive convolution layers might be becoming too small and thus throwing me the error.我意识到来自连续卷积层的 output 形状可能变得太小，从而给我带来了错误。 How is it possible to fix this?怎么可能解决这个问题？ Padding?填充？ What is throwing me off is that I can't edit the kernel size or number of filters to fix the error.让我失望的是我无法编辑 kernel 大小或过滤器数量来修复错误。

Any insight would be appreciated.任何见解将不胜感激。

Answer 1

set your padding to same in each convolution layer:在每个卷积层中将填充设置为same ：

model.add(Conv2D(32, kernel_size=(8, 8),padding='same'))

default padding is valid so it doesn't perform padding.默认填充valid ，因此它不执行填充。 same allows you to add padding so the output_shape is the same than the input_shape. same允许您添加填充，因此 output_shape 与 input_shape 相同。

如何使用固定的 kernel 大小和固定数量的过滤器来修复 CNN 层尺寸错误？

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-12-03 13:32:56

如何使用固定的 kernel 大小和固定数量的过滤器来修复 CNN 层尺寸错误？

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-12-03 13:32:56

解决方案1
2 已采纳 2020-12-03 13:32:56