MNIST 数据集上的可变图像输入分辨率问题（使用 CNN 时）

Question

I'm a bit new when it comes to CNN, so please correct me wherever possible!我对CNN有点新，所以请尽可能纠正我！

I've been experimenting with the MNIST dataset for digit classification.我一直在尝试使用 MNIST 数据集进行数字分类。 I decided to take it one step further by passing my own handwritten digit into the predict method of the model.我决定更进一步，将我自己的手写数字传递到模型的预测方法中。 I am aware that the MaxPooling2D layer only allows fixed input resolution, so after some research, I used GlobalMaxPooling2D.我知道 MaxPooling2D 层只允许固定输入分辨率，所以经过一些研究，我使用了 GlobalMaxPooling2D。 This solved the problem of variable input image resolution.这解决了可变输入图像分辨率的问题。 The problem I am facing right now, is that the predict method accurately predicts images from the test set of the MNIST dataset, but is unable to predict my own handwritten digits.我现在面临的问题是，predict 方法可以从 MNIST 数据集的测试集中准确预测图像，但无法预测我自己的手写数字。 This is my Model:这是我的模型：

model=Sequential()
model.add(Conv2D(128,(5,5),input_shape=(None,None,1), data_format='channels_last'))
model.add(Dense(80, activation='relu'))
model.add(GlobalMaxPooling2D())
model.add(BatchNormalization())
model.add(Dense(10,activation='softmax'))

The model gives a training accuracy of 94.98% and a testing accuracy of 94.52%.该模型的训练精度为 94.98%，测试精度为 94.52%。 For predicting my own handwritten digit, I used an image of resolution 200x200.为了预测我自己的手写数字，我使用了分辨率为 200x200 的图像。 The model somehow can predict specific digits like 8, 6 and 1, but when I test any other digit, it still classifies it into 8, 6 or 1. Can anyone please point out where I'm going wrong?该模型以某种方式可以预测特定的数字，如 8、6 和 1，但是当我测试任何其他数字时，它仍然将其分类为 8、6 或 1。谁能指出我哪里出错了？ Any help is appreciated!任何帮助表示赞赏！

Answer 1

There are several things that can contribute to what you are seeing here.有几件事可以促成您在此处看到的内容。 The optimization process is not good, the way you optimize your model, can have a direct effect on how your model performs.优化过程不好，优化模型的方式会直接影响模型的性能。 The proper choice of an optimizer, learning rate, learning rate decay regime, proper regularization, are just some to name.优化器的正确选择、学习率、学习率衰减机制、适当的正则化，仅举几例。 Other than that your network is very simple and very badly designed.除此之外，您的网络非常简单且设计得很糟糕。 You do not have enough Conv layers to utilize the image structures and provide good abstractions that can be used to do what you are asking it to do.您没有足够的 Conv 层来利用图像结构并提供可用于执行您要求它执行的操作的良好抽象。 You model is not deep enough either.你的模型也不够深。

MNIST by itself is a very easy task, using a linear classifier you can achieve around the very accuracy you achieved, maybe even better. MNIST 本身是一项非常简单的任务，使用线性分类器，您可以达到您所达到的准确度，甚至更好。 this shows you are not exploiting the CNNs or deep architectures capabilities in any good way.这表明您没有以任何好的方式利用 CNN 或深层架构功能。 even a simple one or two fully connected layers should give you better accuracy if properly trained.如果训练得当，即使是简单的一两个完全连接的层也应该给你更好的准确性。

Try making your network deeper, use more CNN layers, followed by BatchNormalization and then ReLUs, and avoid quickly downsampling the input featuremaps.尝试使您的网络更深，使用更多 CNN 层，然后是 BatchNormalization，然后是 ReLU，并避免对输入特征图进行快速下采样。 When you downsample, you lose information, to make up for that, you usually want to increase the filters on the next layer to compensate for the decreased representational capacity caused by this.当您下采样时，您会丢失信息，为了弥补这一点，您通常希望增加下一层的过滤器以补偿由此导致的表示能力下降。 In other words, try to gradually decrease the featuremap's dimensions and likewise, increase the neurons number.换句话说，尝试逐渐减少特征图的维度，同样增加神经元数量。

A huge number of neurons in the beginning is wasteful for you specific use-case, 32/64 can be more than enough, as the network gets deeper, more abstract features are built upon more primitive ones found in the early layers, so having more neurons at the later layers are more reasonable usually.一开始的大量神经元对于您的特定用例来说是浪费的，32/64 可能绰绰有余，随着网络变得更深，更多的抽象特征建立在早期层中发现的更原始的特征上，因此拥有更多后面层的神经元通常更合理。

Early layers are responsible for creating primitive filters and after some point, more filters won't help in performance, it just creates duplicated work that's already been done by some previous filter.早期的层负责创建原始过滤器，在某些时候，更多的过滤器对性能没有帮助，它只会创建以前的过滤器已经完成的重复工作。

The reason you see a difference in accuracy, is simply because you ended up in another local minima!您看到准确性差异的原因仅仅是因为您最终达到了另一个局部最小值！ With the same exact config, if you train 100 times, you will get 100 different results, some better than others and some worse than the others, never the same exact value, unless you use deterministic behavior by using a specific seed and only run in cpu mode.使用相同的精确配置，如果您训练 100 次，您将得到 100 种不同的结果，有些比其他的好，有些比其他的差，永远不会有相同的确切值，除非您通过使用特定种子来使用确定性行为并且只运行cpu模式。

MNIST 数据集上的可变图像输入分辨率问题（使用 CNN 时）

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-11-09 10:25:17

MNIST 数据集上的可变图像输入分辨率问题（使用 CNN 时）

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-11-09 10:25:17

解决方案1
0 已采纳 2020-11-09 10:25:17