简体   繁体   English

使用keras进行Mnist识别

[英]Mnist recognition using keras

How can I train the model to recognize five numbers in one picture. 如何训练模型识别一张图片中的五个数字。 The code is as follows: 代码如下:

from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dropout, Dense, Input
from keras.models import Model, Sequential

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
             activation='relu',
             input_shape=(28, 140, 1)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dropout(0.5))

Here should be a loop for recognizing each number in the picture, but I don't know how to realize it. 这里应该是一个用于识别图片中每个数字的循环,但我不知道如何实现它。

model.add(Dense(11, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
          optimizer=keras.optimizers.Adadelta(),
          metrics=['accuracy'])

model.fit(X_train, y_train,
      batch_size=1000,
      epochs=8,
      verbose=1,
      validation_data=(X_valid, y_valid))

The picture of combined mnist number is as follows: 组合mnist编号的图片如下:

一张图片中的组合数字

I suggest two possible approaches: 我建议两种可能的方法:

Case 1- The images are nicely structured. 案例1-图像结构良好。

In the example you provided, this is indeed the case, so if your data looks like in the link you provided, I will suggest this approach. 在您提供的示例中,情况确实如此,因此,如果您的数据在您提供的链接中显示,我将建议使用此方法。

In the link you provided, every image basically consists of 5 28-by-28 pixeled images stacked together. 在您提供的链接中,每个图像基本上由5个28×28像素的图像堆叠在一起组成。 In this case, I would suggest to cut the images (that is, cut each image into 5 pieces), and train your model as with a usual MNIST data (for example, using the code you provided). 在这种情况下,我建议剪切图像(即将每个图像切割成5个图像),并像通常的MNIST数据一样训练模型(例如,使用您提供的代码)。 Then, when you want to apply your model to classify new data, just cut each new image into 5 pieces as well. 然后,当您想要应用模型对新数据进行分类时,只需将每个新图像切割成5个图像。 Classify each one of these 5 pieces using your model, and then just write these 5 numbers right next to the other as an output. 使用您的模型对这5个部分中的每个进行分类,然后将这5个数字紧挨着另一个作为输出。

so regarding this sentence: 关于这句话:

Here should be a loop for recognizing each number in the picture, but I don't know how to realize it 这里应该是一个用于识别图片中每个数字的循环,但我不知道如何实现它

you don't need a for loop. 你不需要for循环。 Just cut your figures. 只是削减你的数字。

Case 2- The images are not nicely structured. 案例2-图像结构不合理。

In this case, each image is labeled with 5 numbers. 在这种情况下,每个图像都标有5个数字。 So each row in y_train and y_valid ) will be a 0,1-vector with 55 entries. 因此, y_trainy_valid )中的每一行都是一个包含55个条目的y_valid向量。 The first 11 entries is the one-hot encoding of the first number, the second 11 entries is the one-hot encoding of the second number and so on. 前11个条目是第一个数字的单热编码,第二个11个条目是第二个数字的单热编码,依此类推。 So each row in y_train will have 5 entries equal 1, and the rest equal 0. 所以y_train每一行都有5个条目等于1,其余的等于0。

In addition, instead of using softmax activation on the output layer and categorical_crossentropy loss, use sigmoid activation function and 'binary_crossentropy' loss (see further discussion about the reasons here and here ) 此外,不使用输出层上的softmax激活和categorical_crossentropy损失,而是使用sigmoid激活函数和'binary_crossentropy'丢失(请参阅此处此处有关原因的进一步讨论)

To summarize, replace this: 总结一下,替换这个:

model.add(Dense(11, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
          optimizer=keras.optimizers.Adadelta(),
          metrics=['accuracy'])

with this: 有了这个:

model.add(Dense(55, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
          optimizer=keras.optimizers.Adadelta())

The classic work in this area is 'Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks' 该领域的经典工作是“使用深度卷积神经网络从街景图像中进行多位数识别”

Keras model (functional, not sequential): Keras模型(功能性,非顺序性):

inputs = Input(shape=(28, 140, 1), name="input")
x = inputs
x = Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 140, 1))(x)
x = Conv2D(64, (3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Dropout(0.25)(x)
x = Flatten()(x)
x = Dropout(0.5)(x)
digit1 = Dense(10, activation='softmax', name='digit1')(x)
digit2 = Dense(10, activation='softmax', name='digit2')(x)
digit3 = Dense(10, activation='softmax', name='digit3')(x)
digit4 = Dense(10, activation='softmax', name='digit4')(x)
digit5 = Dense(10, activation='softmax', name='digit5')(x)
predictions = [digit1,digit2,digit3,digit4,digit5]
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer=Adam(), metrics=['accuracy'], oss='categorical_crossentropy')

PS You may use 11 classes for 10 digits and empty space. PS您可以使用11个等级,10个数字和空白空间。

Since you already have a very well behaved image, all you have to do is expand the number of classes in your model. 由于您已经拥有一个表现良好的图像,所以您只需要扩展模型中的类数。

You can use 5 times 11 classes instead of using just 11 classes. 您可以使用5次11班而不是仅使用11班。

The first 11 classes identify the first number, the following 11 classes identify the second number and so on. 前11个类标识第一个数字,后面的11个类标识第二个数字,依此类推。 A total of 55 classes, 11 classes for each position in the image. 共有55个班级,图像中每个职位共有11个班级。

So, in short: 简而言之:

  • X_training will be the entire image, as you have shown in the link, shaped as (28,140) , or (140,28) , depending on which methods you're using to load the images. X_training将是整个图像,如链接中所示,形状为(28,140)(140,28) ,具体取决于您用于加载图像的方法。
  • Y_training will be a 55-element vector, shape (55,) , telling which numbers are in each quadrant. Y_training将是一个55个元素的矢量,形状(55,) ,告诉每个象限中的数字。

Example: for the first image, with 9,7,5,4,10, you'd create Y_training with the following positions containing the value 1: 示例:对于第一个图像,使用9,7,5,4,10,您将使用包含值1的以下位置创建Y_training

  • Y_training[9] = 1
  • Y_training[18] = 1 #(18=7+11)
  • Y_training[27] = 1 #(27=5+22)
  • Y_training[37] = 1 #(37=4+33)
  • Y_training[54] = 1 #(54=10+44)

Create your model layers the way you want, pretty much the same as a regular MNIST model, that means: no need to try loops or things like that. 按照您想要的方式创建模型图层,与常规的MNIST模型非常相似,这意味着:无需尝试循环或类似的东西。

But it will probably need to be a little bigger than before. 但它可能需要比以前更大一些。

You will not be able to use categorical_crossentropy anymore, sice you will have 5 correct classes per image instead of just 1. If you're using "sigmoid" activations at the end, binary_crossentropy should be a good replacement. 您将无法再使用categorical_crossentropy ,每个图像将有5个正确的类,而不仅仅是1.如果您最后使用“sigmoid”激活,则binary_crossentropy应该是一个很好的替代品。

Make sure your last layer fits the 55-element vector, such as Dense(55) , for instance. 确保最后一层适合55元素向量,例如Dense(55)

This problem has been tackled by Yann LeCun in the 90's. 这个问题已经由Yann LeCun在90年代解决了。 You can find demos and papers on his website . 你可以在他的网站上找到演示和论文。

A not so general solution is to train a CNN on single digits MNIST and use this CNN to perform inference on images like the one you provided. 一个不太通用的解决方案是在单个数字MNIST上训练CNN并使用该CNN对您提供的图像进行推断。 Prediction is done by sliding the trained CNN on the multi-digit image and applying post processing to aggregate the results and possibly estimating the bounding boxes. 通过在多位数图像上滑动训练的CNN并应用后处理来聚合结果并可能估计边界框来完成预测。

A very general solution that can handle a variable number of number and of different scales and positions is to build a model that is able to predict the bounding boxes of the numbers and perform classification on them. 可以处理可变数量的不同尺度和位置的非常通用的解决方案是构建能够预测数字的边界框并对其进行分类的模型。 There's a recent history of such models with R-CNN, Fast-RCNN and Faster-RCNN . 这种型号的近期历史有R-CNN,Fast-RCNN和Faster-RCNN

You can find a python implementation of Faster-RCNN on github. 你可以在github上找到Faster-RCNN的python实现

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM