简体   繁体   English

神经网络[OCR]

[英]Neural network [ocr]

I come looking for general tips about the program I'm writing now. 我来寻找有关我现在正在编写的程序的一般提示。

The goal is: Use neural network program to recognize 3 letters [D,O,M] (or display "nothing is recognized" if i input anything other than those 3). 目标是:使用神经网络程序识别3个字母[D,O,M](如果我输入的不是3个字母,则显示“什么也没有识别”)。

Here's what I have so far: 这是我到目前为止的内容:

A class for my single neuron 我的单个神经元的课程

public class neuron
{
    double[] weights;
    public neuron()
    {
        weights = null;
    }
    public neuron(int size)
    {
        weights = new double[size + 1];
        Random r = new Random();
        for (int i = 0; i <= size; i++)
        {
            weights[i] = r.NextDouble() / 5 - 0.1;
        }
    }
    public double output(double[] wej)
    {
        double s = 0.0;
        for (int i = 0; i < weights.Length; i++) s += weights[i] * wej[i];
        s = 1 / (1 + Math.Exp(s));
        return s;
    }
}

A class for a layer: 图层的类:

public class layer 
{
    neuron[] tab;
    public layer()
    {
        tab = null;
    }
    public layer(int numNeurons, int numInputs)
    {
        tab = new neuron[numNeurons];
        for (int i = 0; i < numNeurons; i++)
        {
            tab[i] = new neuron(numInputs);
        }
    }
    public double[] compute(double[] wejscia)
    {
        double[] output = new double[tab.Length + 1];
        output[0] = 1;
        for (int i = 1; i <= tab.Length; i++)
        {
            output[i] = tab[i - 1].output(wejscia);
        }
        return output;
    }
}

And finally a class for a network 最后是网络课程

public class network
{
    layer[] layers = null;
    public network(int numLayers, int numInputs, int[] npl)
    {
        layers = new layer[numLayers];
        for (int i = 0; i < numLayers; i++)
        {
            layers[i] = new layer(npl[i], (i == 0) ? numInputs : (npl[i - 1]));
        }

    }
    double[] compute(double[] inputs)
    {
        double[] output = layers[0].compute(inputs);
        for (int i = 1; i < layers.Length; i++)
        {
            output = layers[i].compute(output);

        }
        return output;
    }
}

Now for the algorythm I chose: 现在,我选择了算法:

I have a picture box, size 200x200, where you can draw a letter (or read one from jpg file). 我有一个尺寸为200x200的图片框,您可以在其中画一个字母(或从jpg文件中读取一个字母)。

I then convert it to my first array(get the whole picture) and 2nd one(cut the non relevant background around it) like so: 然后,将其转换为我的第一个数组(获取整个图片)和第二个数组(剪切周围的无关背景),如下所示:

Bitmap bmp2 = new Bitmap(this.pictureBox1.Image);
        int[,] binaryfrom = new int[bmp2.Width, bmp2.Height];

        int minrow=0, maxrow=0, mincol=0, maxcol=0;
        for (int i = 0; i < bmp2.Height; i++)
        {
            for (int j = 0; j < bmp2.Width; j++)
            {
                if (bmp2.GetPixel(j, i).R == 0)
                {
                    binaryfrom[i, j] = 1;
                    if (minrow == 0) minrow = i;
                    if (maxrow < i) maxrow = i;
                    if (mincol == 0) mincol = j;
                    else if (mincol > j) mincol = j;
                    if (maxcol < j) maxcol = j;
                }
                else
                {
                    binaryfrom[i, j] = 0;
                }
            }
        }


        int[,] boundaries = new int[binaryfrom.GetLength(0)-minrow-(binaryfrom.GetLength(0)-(maxrow+1)),binaryfrom.GetLength(1)-mincol-(binaryfrom.GetLength(1)-(maxcol+1))];

        for(int i = 0; i < boundaries.GetLength(0); i++)
        {
            for(int j = 0; j < boundaries.GetLength(1); j++)
            {
                boundaries[i, j] = binaryfrom[i + minrow, j + mincol];

            }
        }

And convert it to my final array of 12x8 like so (i know I could shorten this a fair bit, but wanted to have every step in different loop so I can see what went wrong easier[if anything did]): 然后像这样将其转换为我的最终12x8数组(我知道我可以将其缩短很多,但是希望每个步骤都处于不同的循环中,这样我就可以更容易地发现出了什么问题[如果有的话]):

int[,] finalnet = new int[12, 8];

        int k = 1;
        int l = 1;

        for (int i = 0; i < finalnet.GetLength(0); i++)
        {
            for (int j = 0; j < finalnet.GetLength(1); j++)
            {
                finalnet[i, j] = 0;
            }
        }

        while (k <= finalnet.GetLength(0))
            {
                while (l <= finalnet.GetLength(1))
                {
                    for (int i = (int)(boundaries.GetLength(0) / finalnet.GetLength(0)) * (k - 1); i < (int)(boundaries.GetLength(0) / finalnet.GetLength(0)) * k; i++)
                    {
                        for (int j = (int)(boundaries.GetLength(1) / finalnet.GetLength(1)) * (l - 1); j < (int)(boundaries.GetLength(1) / finalnet.GetLength(1)) * l; j++)
                        {
                            if (boundaries[i, j] == 1) finalnet[k-1, l-1] = 1;
                        }
                    }
                    l++;
                }
                l = 1;
                k++;
            }
        int a = boundaries.GetLength(0);
        int b = finalnet.GetLength(1);
       if((a%b) != 0){

            k = 1;

            while (k <= finalnet.GetLength(1))
            {
                for (int i = (int)(boundaries.GetLength(0) / finalnet.GetLength(0)) * finalnet.GetLength(0); i < boundaries.GetLength(0); i++)
                {
                    for (int j = (int)(boundaries.GetLength(1) / finalnet.GetLength(1)) * (k - 1); j < (int)(boundaries.GetLength(1) / finalnet.GetLength(1)) * k; j++)
                    {
                        if (boundaries[i, j] == 1) finalnet[finalnet.GetLength(0) - 1, k - 1] = 1;
                    }

                }
                k++;
            }
        }

        if (boundaries.GetLength(1) % finalnet.GetLength(1) != 0)
        {
            k = 1;

            while (k <= finalnet.GetLength(0))
            {
                for (int i = (int)(boundaries.GetLength(0) / finalnet.GetLength(0)) * (k - 1); i < (int)(boundaries.GetLength(0) / finalnet.GetLength(0)) * k; i++)
                {
                    for (int j = (int)(boundaries.GetLength(1) / finalnet.GetLength(1)) * finalnet.GetLength(1); j < boundaries.GetLength(1); j++)
                    {
                        if (boundaries[i, j] == 1) finalnet[k - 1, finalnet.GetLength(1) - 1] = 1;
                    } 
                }
                k++;
            }

            for (int i = (int)(boundaries.GetLength(0) / finalnet.GetLength(0)) * finalnet.GetLength(0); i < boundaries.GetLength(0); i++)
            {
                for (int j = (int)(boundaries.GetLength(1) / finalnet.GetLength(1)) * finalnet.GetLength(1); j < boundaries.GetLength(1); j++)
                {
                    if (boundaries[i, j] == 1) finalnet[finalnet.GetLength(0) - 1, finalnet.GetLength(1) - 1] = 1;
                }
            }
        }

The result is a 12x8 (I can change it in the code to get it from some form controls) array of 0 and 1, where 1 form the rough shape of a letter you drawn. 结果是一个0x和1的12x8(我可以在代码中更改它以便从某些表单控件获取它)数组,其中1形成绘制的字母的大致形状。

Now my questions are: Is this a correct algorythm? 现在我的问题是:这是正确的算法吗? Is my function 是我的职责

1/(1+Math.Exp(x))

good one to use here? 好用在这里? What should be the topology? 拓扑应该是什么? 2 or 3 layers, and if 3, how many neurons in hidden layer? 2或3层,如果是3层,则隐藏层中有多少个神经元? I have 96 inputs (every field of the finalnet array), so should I also take 96 neurons in the first layer? 我有96个输入(finalnet数组的每个字段),所以我也应该在第一层接受96个神经元吗? Should I have 3 neurons in the final layer or 4(to take into account the "not recognized" case), or is it not necessary? 我是否应该在最后一层中包含3个神经元或4个神经元(考虑到“未识别”的情况),还是没有必要?

Thank you for your help. 谢谢您的帮助。

EDIT: Oh, and I forgot to add, I'm gonna train my network using Backpropagation algorythm. 编辑:哦,我忘了补充,我要使用反向传播算法来训练我的网络。

  1. You may need 4 layers at least to get accurate results using back propagation method. 使用反向传播方法,您可能至少需要4层才能获得准确的结果。 1 input, 2 middle layers, and an output layer. 1个输入,2个中间层和一个输出层。

  2. 12 * 8 matrix is too small(and you may end up in data loss which will result in total failure) - try some thing 16 * 16. If you want to reduce the size then you have to peel out the outer layers of black pixels further. 12 * 8矩阵太小(可能会导致数据丢失,从而导致完全失败)-尝试尝试16 *16。如果要减小尺寸,则必须剥去黑色像素的外层进一步。

  3. Think about training the network with your reference characters. 考虑使用您的参考字符来训练网络。

  4. Remember that you have to feed back the output back to the input layer again and iterate it multiple times. 请记住,您必须再次将输出反馈回输入层,并对其进行多次迭代。

A while back I created a neural net to recognize digits 0-9 (python, sorry), so based on my (short) experience, 3 layers are ok and 96/50/3 topology will probably do a good job. 前一段时间,我创建了一个神经网络来识别数字0-9 (python,对不起),因此根据我的(简短)经验,可以确定3层,而96/50/3拓扑可能会做得很好。 As for the output layer, it's your choice; 至于输出层,则由您选择; you can either backpropagate all 0s when the input image is not a D, O or M or use the fourth output neuron to indicate that the letter was not recognized. 您可以在输入图像不是D,O或M的情况下反向传播全0,或使用第四个输出神经元表示未识别出该字母。 I think that the first option would be the best one because it's simpler (shorter training time, less problems debugging the net...), you just need to apply a threshold under which you classify the image as 'not recognized'. 我认为第一种选择是最好的选择,因为它更简单(培训时间更短,调试网络的问题更少...),您只需要应用一个阈值即可将图像分类为“无法识别”。
I also used the sigmoid as activation function, I didn't try others but it worked :) 我也使用了sigmoid作为激活函数,虽然我没有尝试其他方法,但它确实有效:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM