简体   繁体   中英

Training letter images to a neural network with full-batch training

According to this tutorial (Pure Python with NumPy), I want to build a simple(at simplest level for learning purpose) neural network(Perceptron) that can train to recognize "A" letter. In this tutorial, in the proposed example, they build a network that can learn "AND" logical operator. In this case, we have some inputs(4*3 Matrix) and one output(4*1 Matrix):

在此处输入图片说明

Each time we subtract output matrix with input matrix and calculate the error and updating rate and so on.

Now I want to give an image as an input, in this case, What will be my output? How can I define that image is an "A" letter? one solution is define "1" as "A" letter and "0" for "non-A" , But if my output is a scalar, How can I subtract it with hidden layer and calculate error and update weights? This tutorial uses "full-batch" training and multiply whole input matrix with weight matrix. I want to do with this method. The final destination is designing a neural net that can recognize "A" letter in the simplest form. I have no idea how to do this.

Fist off: Great that you try to understand neural networks by programming them from scratch, instead of starting of with some complex library. Let me try to clear things up: your understanding here:

Each time we subtract output matrix with input matrix and calculate the error and updating rate and so on.

is not really correct. In your example, the input matrix X is what you present to the input of your neural network. The output Y is what you want the network to do for X : the first element Y[0] is the desired output for the first row of X , and so on. We often call this the "target vector". Now to calculate the loss function (ie the error) we compare the output of the network ( L2 in the linked example code), to the target vector Y . In words, we compare what we want the network to do ( Y ) to what it really does ( L2 ). Then we make one step towards a direction which is closer to Y .

Now, if you want to use an image as the input, you should think of each pixel in the image as one input variable. Previously, we had two input variables: A and B, for which we wanted to calculate the term X = A ∧ B.

Example :

If we take a 8-by-8 pixel image, we have 8*8=64 input variables. Thus, our input matrix X should be a matrix with 65 columns (the 64 pixels of the image + 1 input as bias term, which is constantly =1) and one row per training example you have. Eg if you have one image of each of the 26 letters, the matrix will contain 26 rows.

The output (target) vector Y should have the same length as X , ie 26 in the previous example. Each element in Y is 1 if the corresponding input row is an A, and 0 if it is another letter. In our example, Y[0] would be 1, Y[1:] would be 0.

Now, you can use the same code as before: the output L2 will be a vector containing the networks prediction, which you can then compare to Y as before.

tl;dr The key idea is to forget that an image is 2D, and store each input image as a vector.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM