简体   繁体   English

Python机器学习数字识别

[英]Python Machine Learning Digit Recognition

I'm following along with the code at this site: 我跟着这个网站上的代码:

https://blog.luisfred.com.br/reconhecimento-de-escrita-manual-com-redes-neurais-convolucionais/

Below is the code the site walks through: 下面是该网站遍历的代码:

from keras. datasets import mnist
from keras. models import Sequential
from keras. layers import Dense
from keras. layers import Dropout
from keras. layers import Flatten
import numpy as np
from matplotlib import pyplot as plt
from keras. layers . convolutional import Conv2D
from keras. layers . convolutional import MaxPooling2D
from keras. utils import np_utils
from keras import backend as K
K . set_image_dim_ordering ( 'th' )
import cv2
import matplotlib. pyplot as plt
#% inline matplotlib # If you are using Jupyter, it will be useful for plotting graphics or figures inside cells

#Divided the data into subsets of training and testing.
( X_train , y_train ) , ( X_test , y_test ) = mnist. load_data ( )
# Since we are working in gray scale we can
# set the depth to the value 1.
X_train = X_train . reshape ( X_train . shape [ 0 ] , 1 , 28 , 28 ) . astype ( 'float32' )
X_test = X_test . reshape ( X_test . shape [ 0 ] , 1 , 28 , 28 ) . astype ( 'float32' )
# We normalize our data according to the
# gray scale. The floating point values ​​are in the range [0,1], instead of [.255]
X_train = X_train / 255
X_test = X_test / 255
# Converts y_train and y_test, which are class vectors, to a binary class array (one-hot vectors)
y_train = np_utils. to_categorical ( y_train )
y_test = np_utils. to_categorical ( y_test )
# Number of digit types found in MNIST. In this case, the value is 10, corresponding to (0,1,2,3,4,5,6,7,8,9).
num_classes = y_test. shape [ 1 ]


def deeper_cnn_model ( ) :
    model = Sequential ( )
    # Convolution2D will be our input layer. We can observe that it has
    # 30 feature maps with size of 5 × 5 and an activation function of type ReLU.
    model.add ( Conv2D ( 30 , ( 5 , 5 ) , input_shape = ( 1 , 28 , 28 ) , activation = 'relu' ) )
    # The MaxPooling2D layer will be our second layer where we will have a sample window of size 2 x 2
    model.add ( MaxPooling2D ( pool_size = ( 2 , 2 ) ) )
    # A new convolutional layer, with 15 feature maps of size 3 × 3, and activation function ReLU
    model.add ( Conv2D ( 15 , ( 3 , 3 ) , activation = 'relu' ) )
    # A new subsampling with a 2x2 dimension pooling.
    model.add ( MaxPooling2D ( pool_size = ( 2 , 2 ) ) )

    # We include a dropout with a 20% probability (you can try other values)
    model.add ( Dropout ( 0.2 ) )
    # We need to convert the output of the convolutional layer, so that it can be used as input to the densely connected layer that is next.
    # What this does is "flatten / flatten" the structure of the output of the convolutional layers, creating a single long vector of features
    # that will be used by the Fully Connected layer.
    model.add ( Flatten ( ) )
    # Fully connected layer with 128 neurons.
    model.add ( Dense ( 128 , activation = 'relu' ) )
    # Followed by a new fully connected layer with 64 neurons
    model.add ( Dense ( 64 , activation = 'relu' ) )

    # Followed by a new fully connected layer with 32 neurons
    model.add ( Dense ( 32 , activation = 'relu' ) )
    # The output layer has the number of neurons compatible with the
    # number of classes to be obtained. Notice that we are using a softmax activation function,
    model.add ( Dense ( num_classes, activation = 'softmax' , name = 'preds' ) )
    # Configure the entire training process of the neural network
    model.compile ( loss = 'categorical_crossentropy' , optimizer = 'adam' , metrics = [ 'accuracy' ] )

    return model


model = deeper_cnn_model ( )
model.summary ( )
model.fit ( X_train , y_train, validation_data = ( X_test , y_test ) , epochs = 10 , batch_size = 200 )
scores = model. evaluate ( X_test , y_test, verbose = 0 )
print ( "\ nacc:% .2f %%" % (scores [1] * 100))


###enhance to check multiple numbers after the training is done

img_pred = cv2. imread ( 'five.JPG' ,   0 )

plt.imshow(img_pred, cmap='gray')
# forces the image to have the input dimensions equal to those used in the training data (28x28)
if img_pred. shape != [ 28 , 28 ] :
    img2 = cv2. resize ( img_pred, ( 28 , 28 ) )
    img_pred = img2. reshape ( 28 , 28 , - 1 ) ;
else :
    img_pred = img_pred. reshape ( 28 , 28 , - 1 ) ;

# here also we inform the value for the depth = 1, number of rows and columns, which correspond 28x28 of the image.
img_pred = img_pred. reshape ( 1 , 1 , 28 , 28 )
pred = model. predict_classes ( img_pred )
pred_proba = model. predict_proba ( img_pred )
pred_proba = "% .2f %%" % (pred_proba [0] [pred] * 100)
print ( pred [ 0 ] , "with probability of" , pred_proba )

At the end of this I try to make a prediction on the number five I've drawn and imported (I've tried with other hand drawn numbers as well with equally poor results): 最后,我尝试对绘制和导入的数字进行预测(我尝试过使用其他手绘数字,但结果同样差):

img_pred = cv2. imread ( 'five.JPG' ,   0 )

plt.imshow(img_pred, cmap='gray')
# forces the image to have the input dimensions equal to those used in the training data (28x28)
if img_pred. shape != [ 28 , 28 ] :
    img2 = cv2. resize ( img_pred, ( 28 , 28 ) )
    img_pred = img2. reshape ( 28 , 28 , - 1 ) ;
else :
    img_pred = img_pred. reshape ( 28 , 28 , - 1 ) ;

# here also we inform the value for the depth = 1, number of rows and columns, which correspond 28x28 of the image.
img_pred = img_pred. reshape ( 1 , 1 , 28 , 28 )
pred = model. predict_classes ( img_pred )
pred_proba = model. predict_proba ( img_pred )
pred_proba = "% .2f %%" % (pred_proba [0] [pred] * 100)
print ( pred [ 0 ] , "with probability of" , pred_proba )

Here is a look at five.jpg: 看一下Five.jpg:

hand drawn five image 手绘五个图像

But when I input my own number the model predicts wrong. 但是,当我输入自己的数字时,模型会预测错误。 Any thoughts as to why this might be? 关于为什么会这样的任何想法? I'll admit I'm new to ML and just starting to dabble with it. 我承认我是ML的新手,并且刚刚开始涉足它。 My thought was maybe the centering of the image or the normalization of the image is off? 我的想法可能是图像居中或图像规范化关闭了? Any help is much appreciated! 任何帮助深表感谢!

Edit1: 编辑1:

MNIST test number will look something like this: MNIST测试编号如下所示:

white numbers black backgrounds 白色数字黑色背景

It looks like you have two issues, which, as you suspected, are related to the pre-processing of your data. 您似乎有两个问题,您怀疑这与数据的预处理有关。

The first is that your image is inverted relative to the training data: 首先是您的图像相对于训练数据是反转的:

  • After reading in one channel of your .jpg with img_pred = cv2. imread ( 'five.JPG' , 0 ) img_pred = cv2. imread ( 'five.JPG' , 0 )阅读.jpg的一个通道后img_pred = cv2. imread ( 'five.JPG' , 0 ) img_pred = cv2. imread ( 'five.JPG' , 0 ) , the background pixels are near-white with values in the neighborhood of 215-238. img_pred = cv2. imread ( 'five.JPG' , 0 ) ,背景像素接近白色,其值在215-238附近。
  • If you look at the training data in X_train , the background pixels are all zero, with the numerals as white or near-white (upper 210-255). 如果您查看X_train中的训练数据,则背景像素全为零,数字为白色或接近白色(上部210-255)。

Try plotting your image next to some of the selections from X_train and you will see they are inverted. 尝试在X_train某些选择旁边绘制图像,您将看到它们被反转了。

The other issue is that the default interpolation in cv2.resize() does not preserve the scaling of your data. 另一个问题是cv2.resize()中的默认插值不能保留数据的缩放比例。 After you resize your data, the minimum value jumps up to 60, rather than 0. Compare the value of img.pred.min() and img.pred.max() before and after your rescaling step. 调整数据大小后,最小值跳升至60,而不是0。在调整尺度之前和之后,比较img.pred.min()img.pred.max()的值。

You can invert and scale your data to look more like the MNIST input data with a function like the following: 您可以使用以下功能反转和缩放数据,使其看起来更像MNIST输入数据:

 def mnist_bytescale(image):
    # Use float for rescaling
    img_temp = image.astype(np.float32)
    #Re-zero the data
    img_temp -= img_temp.min()
    #Re-scale and invert
    img_temp /= (img_temp.max()-img_temp.min())
    img_temp *= 255
    return 255 - img_temp.astype('uint')

This will flip your data, and linearly scale it from 0 to 255, much like the data that the network is training on. 这将翻转您的数据,并将其从0线性缩放到255,非常类似于网络正在训练的数据。 However, if you plot mnist_bytescale(img_pred) , you will notice that the background level in most pixels is still not quite 0, since the background level of your original image is not constant (perhaps due to JPEG compression.) If your network still has issues with this flipped and scaled data, you might try using np.clip to zero-out the background level and see if that helps. 但是,如果绘制mnist_bytescale(img_pred) ,则会注意到大多数像素中的背景水平仍然不是0,这是因为原始图像的背景水平不是恒定的(可能是由于JPEG压缩所致。)翻转和缩放数据的问题,您可以尝试使用np.clip将背景水平归零,看看是否有帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM