Hi, I'm a novice programmer who's having trouble with simple image processing.
My goal here is to make the program recognize that the two A's are... well, both A's. If you look carefully enough, you'll realize that it's a bit different.(on the scale of pixels.) Although any literate person can read both as 'A', I'm sure that a program that compares pixel by pixel will not work because the two A's are actually different. And to make things worse, these two have different dimensions - one is 48*60, the other is 48*61.
I wonder if there are ways for a program to 'read' it both as A's. I have heard that this is something called computer vision(not so sure)... I would really prefer the method to be simple - it is not about identifying arbitrary characters; only 'A'. but if it can't be that way, any explanation to make the computer see both these as A's are really welcome.
Thanks in advance :)
First: character recognition not only isn't a simple problem, it's not a completely solved problem.
Are there many OCR implementations? Yes. Are those implementations good? It depends on the application. The more generalized you think OCR should be, the worse existing implementations look.
Long story short, there are books dedicated to this very subject, and it takes a book of some length to provide answers in any level of meaningful detail.
There are quite a few techniques for OCR (optical character recognition). Different techniques have been developed for (a) machine-printed characters versus (b) hand-written characters. Reading machine-printed characters is generally easier, but not necessarily easy. Reading handwritten characters can be very hard, and remains an incompletely solved problem. Keep in mind that there are other "scripts" (systems of characters for writing), and recognition techniques for Latin characters may be different than recognition techniques for traditional Chinese characters. [If you could write a mobile OCR application to read handwritten Chinese characters quickly and accurately, you could make a pile of money.]
https://en.wikipedia.org/wiki/Optical_character_recognition
There are quite a few approaches to OCR, and if you're interested in actually writing code to perform OCR than naturally you should consider implementing at least one of the simpler techniques first. From your comments it sounds like you're already looking into that, but briefly: do NOT look at neural networks first. Yes, you'll probably end up there, but there's much to learn about imaging, lighting, and basic image processing before you can put neural network techniques to much use.
But before you get into any deep, take some time to try to solve the problem yourself:
After tinkering for a bit, read up on some basic image processing techniques. A good book is Digital Image Processing by Gonzalez and Woods.
(Normalized correlation is a simple algorithm you can read about online and in books. It's useful for certain simple types of OCR. You can think of normalized correlation as a method of comparing a "stencil" of a reference 'A' character to samples of other characters that may or may not be 'A' characters--the closer the stencil matches the sample, the higher the confidence the sample is an A.
So yes, try using OpenCV's template matching. First tinker with the OpenCV functions and learn when template matching works and when it fails, and then look more closely at the code.)
A recent survey of OCR techniques can be found in this book: Character Recognition Systems by Cheriet. It's a good starting point to investigate various algorithms. Some of the techniques will be quite surprising and counter-intuitive.
To learn more about how humans recognize characters--the details of which are often surprising and counter-intuitive--read the book Reading in the Brain by Dehaene. This book is quite readable and requires no special math or programming skills.
Finally, for any OCR algorithm it's important to keep the following in mind:
Good luck!
Your problem looks like optical character recognition. A very common approach for this is the use of a neural network. The neural network will analyse the image and give you probabilities for each letter. But you have to train it first, and neural networks are a subject of active research, so there is not a simple "drop-in" solution I know of.
Ok it is true that there is no simple "drop-in" for this problem. i'll try to explain the neural network method in a simple way to clear things up for you a little bit. First of all you need to represent the images in a simpler way! what that means is, right now your images are 48*60 matrices and are gray scale. consider taking the following actions:
Now we will use boxing method on the results. divide your 50*50 image into for example 8*8 grid sections. count how many pixels there are in each section and put the result in a 8*8 matrix name C . now you have a matrix C that is 8 by 8 and it is a simple representation of your original images. gather some training data and test data and simply use the Neural net pattern recognition app of matlab ( you do need to know how ANN works in order to use this app)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.