简体   繁体   中英

Python Machine Learning Digit Recognition

I'm following along with the code at this site:

https://blog.luisfred.com.br/reconhecimento-de-escrita-manual-com-redes-neurais-convolucionais/

Below is the code the site walks through:

from keras. datasets import mnist
from keras. models import Sequential
from keras. layers import Dense
from keras. layers import Dropout
from keras. layers import Flatten
import numpy as np
from matplotlib import pyplot as plt
from keras. layers . convolutional import Conv2D
from keras. layers . convolutional import MaxPooling2D
from keras. utils import np_utils
from keras import backend as K
K . set_image_dim_ordering ( 'th' )
import cv2
import matplotlib. pyplot as plt
#% inline matplotlib # If you are using Jupyter, it will be useful for plotting graphics or figures inside cells

#Divided the data into subsets of training and testing.
( X_train , y_train ) , ( X_test , y_test ) = mnist. load_data ( )
# Since we are working in gray scale we can
# set the depth to the value 1.
X_train = X_train . reshape ( X_train . shape [ 0 ] , 1 , 28 , 28 ) . astype ( 'float32' )
X_test = X_test . reshape ( X_test . shape [ 0 ] , 1 , 28 , 28 ) . astype ( 'float32' )
# We normalize our data according to the
# gray scale. The floating point values ​​are in the range [0,1], instead of [.255]
X_train = X_train / 255
X_test = X_test / 255
# Converts y_train and y_test, which are class vectors, to a binary class array (one-hot vectors)
y_train = np_utils. to_categorical ( y_train )
y_test = np_utils. to_categorical ( y_test )
# Number of digit types found in MNIST. In this case, the value is 10, corresponding to (0,1,2,3,4,5,6,7,8,9).
num_classes = y_test. shape [ 1 ]


def deeper_cnn_model ( ) :
    model = Sequential ( )
    # Convolution2D will be our input layer. We can observe that it has
    # 30 feature maps with size of 5 × 5 and an activation function of type ReLU.
    model.add ( Conv2D ( 30 , ( 5 , 5 ) , input_shape = ( 1 , 28 , 28 ) , activation = 'relu' ) )
    # The MaxPooling2D layer will be our second layer where we will have a sample window of size 2 x 2
    model.add ( MaxPooling2D ( pool_size = ( 2 , 2 ) ) )
    # A new convolutional layer, with 15 feature maps of size 3 × 3, and activation function ReLU
    model.add ( Conv2D ( 15 , ( 3 , 3 ) , activation = 'relu' ) )
    # A new subsampling with a 2x2 dimension pooling.
    model.add ( MaxPooling2D ( pool_size = ( 2 , 2 ) ) )

    # We include a dropout with a 20% probability (you can try other values)
    model.add ( Dropout ( 0.2 ) )
    # We need to convert the output of the convolutional layer, so that it can be used as input to the densely connected layer that is next.
    # What this does is "flatten / flatten" the structure of the output of the convolutional layers, creating a single long vector of features
    # that will be used by the Fully Connected layer.
    model.add ( Flatten ( ) )
    # Fully connected layer with 128 neurons.
    model.add ( Dense ( 128 , activation = 'relu' ) )
    # Followed by a new fully connected layer with 64 neurons
    model.add ( Dense ( 64 , activation = 'relu' ) )

    # Followed by a new fully connected layer with 32 neurons
    model.add ( Dense ( 32 , activation = 'relu' ) )
    # The output layer has the number of neurons compatible with the
    # number of classes to be obtained. Notice that we are using a softmax activation function,
    model.add ( Dense ( num_classes, activation = 'softmax' , name = 'preds' ) )
    # Configure the entire training process of the neural network
    model.compile ( loss = 'categorical_crossentropy' , optimizer = 'adam' , metrics = [ 'accuracy' ] )

    return model


model = deeper_cnn_model ( )
model.summary ( )
model.fit ( X_train , y_train, validation_data = ( X_test , y_test ) , epochs = 10 , batch_size = 200 )
scores = model. evaluate ( X_test , y_test, verbose = 0 )
print ( "\ nacc:% .2f %%" % (scores [1] * 100))


###enhance to check multiple numbers after the training is done

img_pred = cv2. imread ( 'five.JPG' ,   0 )

plt.imshow(img_pred, cmap='gray')
# forces the image to have the input dimensions equal to those used in the training data (28x28)
if img_pred. shape != [ 28 , 28 ] :
    img2 = cv2. resize ( img_pred, ( 28 , 28 ) )
    img_pred = img2. reshape ( 28 , 28 , - 1 ) ;
else :
    img_pred = img_pred. reshape ( 28 , 28 , - 1 ) ;

# here also we inform the value for the depth = 1, number of rows and columns, which correspond 28x28 of the image.
img_pred = img_pred. reshape ( 1 , 1 , 28 , 28 )
pred = model. predict_classes ( img_pred )
pred_proba = model. predict_proba ( img_pred )
pred_proba = "% .2f %%" % (pred_proba [0] [pred] * 100)
print ( pred [ 0 ] , "with probability of" , pred_proba )

At the end of this I try to make a prediction on the number five I've drawn and imported (I've tried with other hand drawn numbers as well with equally poor results):

img_pred = cv2. imread ( 'five.JPG' ,   0 )

plt.imshow(img_pred, cmap='gray')
# forces the image to have the input dimensions equal to those used in the training data (28x28)
if img_pred. shape != [ 28 , 28 ] :
    img2 = cv2. resize ( img_pred, ( 28 , 28 ) )
    img_pred = img2. reshape ( 28 , 28 , - 1 ) ;
else :
    img_pred = img_pred. reshape ( 28 , 28 , - 1 ) ;

# here also we inform the value for the depth = 1, number of rows and columns, which correspond 28x28 of the image.
img_pred = img_pred. reshape ( 1 , 1 , 28 , 28 )
pred = model. predict_classes ( img_pred )
pred_proba = model. predict_proba ( img_pred )
pred_proba = "% .2f %%" % (pred_proba [0] [pred] * 100)
print ( pred [ 0 ] , "with probability of" , pred_proba )

Here is a look at five.jpg:

hand drawn five image

But when I input my own number the model predicts wrong. Any thoughts as to why this might be? I'll admit I'm new to ML and just starting to dabble with it. My thought was maybe the centering of the image or the normalization of the image is off? Any help is much appreciated!

Edit1:

MNIST test number will look something like this:

white numbers black backgrounds

It looks like you have two issues, which, as you suspected, are related to the pre-processing of your data.

The first is that your image is inverted relative to the training data:

  • After reading in one channel of your .jpg with img_pred = cv2. imread ( 'five.JPG' , 0 ) img_pred = cv2. imread ( 'five.JPG' , 0 ) , the background pixels are near-white with values in the neighborhood of 215-238.
  • If you look at the training data in X_train , the background pixels are all zero, with the numerals as white or near-white (upper 210-255).

Try plotting your image next to some of the selections from X_train and you will see they are inverted.

The other issue is that the default interpolation in cv2.resize() does not preserve the scaling of your data. After you resize your data, the minimum value jumps up to 60, rather than 0. Compare the value of img.pred.min() and img.pred.max() before and after your rescaling step.

You can invert and scale your data to look more like the MNIST input data with a function like the following:

 def mnist_bytescale(image):
    # Use float for rescaling
    img_temp = image.astype(np.float32)
    #Re-zero the data
    img_temp -= img_temp.min()
    #Re-scale and invert
    img_temp /= (img_temp.max()-img_temp.min())
    img_temp *= 255
    return 255 - img_temp.astype('uint')

This will flip your data, and linearly scale it from 0 to 255, much like the data that the network is training on. However, if you plot mnist_bytescale(img_pred) , you will notice that the background level in most pixels is still not quite 0, since the background level of your original image is not constant (perhaps due to JPEG compression.) If your network still has issues with this flipped and scaled data, you might try using np.clip to zero-out the background level and see if that helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM