简体   繁体   English

如何提高 CNN 在图像识别上的准确率

[英]How to Increase Accuracy of CNN on Image Recognition

I am training a CNN for image classification.我正在训练 CNN 进行图像分类。 Specifically, I am trying to create a lip reader that is able to classify an image of a segmented mouth with its associated phoneme.具体来说,我正在尝试创建一个唇读器,它能够将分段嘴巴的图像与其相关的音素进行分类。 The images have a dimension of 64x64 and are flattened into a 1D array of length 4096. I have inserted the code for my current model below with its performance graphs and metrics.这些图像的尺寸为 64x64,并被展平为长度为 4096 的一维数组。我在下面插入了当前 model 的代码及其性能图表和指标。 Does anyone have any advice for how I can continue to modify this model in order to raise the accuracy?有人对我如何继续修改此 model 以提高准确性有任何建议吗?

df = pd.read_csv("/kaggle/input/labeled-frames-resized/labeled_frames.csv", error_bad_lines=False)
labelencoder = LabelEncoder()
df['Phoneme'] = labelencoder.fit_transform(df['Phoneme'])
labels = np.asarray(df[['Phoneme']].copy())
df = df.drop(df.columns[0], axis = 1)

X_train, X_test, y_train, y_test = train_test_split(df, labels, random_state = 42, test_size = 0.2, stratify = labels)
X_train = tf.reshape(X_train, (8113, 4096, 1))
X_test = tf.reshape(X_test, (2029, 4096, 1))

model = Sequential()
model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid', input_shape= (4096, 1)))
model.add(MaxPooling1D(pool_size=2))

model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid'))
model.add(MaxPooling1D(pool_size=2))

model.add(Dropout(0.5))
model.add(MaxPooling1D(pool_size=2))

model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid'))
model.add(MaxPooling1D(pool_size=2))

model.add(Dropout(0.2))
model.add(MaxPooling1D(pool_size=2))

model.add(Conv1D(filters= 128, kernel_size=3, activation ='relu',strides = 2, padding = 'valid'))
model.add(MaxPooling1D(pool_size=2))

model.add(Dropout(0.2))
model.add(MaxPooling1D(pool_size=2))

model.add(Flatten())
model.add(Dense(39)) 
model.add(Activation('softmax'))

optimizer = keras.optimizers.Adam(lr=0.4)

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


history = model.fit(X_train,y_train, epochs = 500, batch_size = 2048, validation_data = (X_test, y_test), shuffle = True)

模型图

模型指标

You can easily convert it into 2D Convolution:您可以轻松地将其转换为 2D 卷积:

model.add(Conv2D(filters= 128, kernel_size=(3,3), activation ='relu',strides = (2,2), 
                 padding = 'valid', input_shape= (64,64,1)))
model.add(MaxPooling2D(pool_size=(2,2))
...
model.add(Flatten())
model.add(Dense(39)) 
model.add(Activation('softmax'))

I've only worked with Conv1d so far because it seemed easier.到目前为止,我只使用过 Conv1d,因为它看起来更容易。

Can 1D Convolution be used on images?可以在图像上使用一维卷积吗?

  • Yes you can, but not recommended, unless you have a very specific case and know what you are doing.是的,你可以,但不推荐,除非你有一个非常具体的案例并且知道你在做什么。 Assume your images as 1024x1024, what happens when you flatten them?假设您的图像为 1024x1024,当您将它们平时会发生什么? The information that you extract with 2D Convolutions is more than 1D Convolutions.您使用 2D Convolutions 提取的信息不仅仅是 1D Convolutions。

Explanation:解释:

You can use 1D convolution on images indeed, but not in every situation.您确实可以在图像上使用一维卷积,但并非在所有情况下都可以。 (I might be wrong) When you flatten them, then every pixel will be a feature. (我可能错了)当你将它们展平时,每个像素都会成为一个特征。 If we wanted every pixel to be a feature, then we could use normal Dense layers after flattening also.如果我们希望每个像素都成为一个特征,那么我们也可以在展平后使用普通的Dense层。 But there would be a lot parameters to train.但是会有很多参数需要训练。 What I mean by this (total parameters size not included) :我的意思是(不包括总参数大小)

model= tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(...)
...
])

When you flatten them you might break the spatial coherence of the images.当您将它们展平时,您可能会破坏图像的空间连贯性 Using 2D convolutions might gain you accuracy.使用 2D 卷积可能会提高您的准确性。 What we do with 2D convolutions is we visit the image and see what we can extract as an important feature , with max or average pooling.我们对 2D 卷积所做的是访问图像,看看我们可以提取什么作为重要特征,使用最大或平均池化。

在此处输入图像描述

在此处输入图像描述

  • You will not be able catch that much information with 1D convolutions.您将无法使用一维卷积捕获那么多信息。

在此处输入图像描述

  • We can feed the pooled feature maps into Fully Connected Layers before making predictions.在进行预测之前,我们可以将池化的特征图输入全连接层。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM