简体   繁体   中英

Computer Vision - Recognize 'A' from an image of 'A'

从File_01中提取的“ A”

从File_02中提取的“ A”

Hi, I'm a novice programmer who's having trouble with simple image processing.

My goal here is to make the program recognize that the two A's are... well, both A's. If you look carefully enough, you'll realize that it's a bit different.(on the scale of pixels.) Although any literate person can read both as 'A', I'm sure that a program that compares pixel by pixel will not work because the two A's are actually different. And to make things worse, these two have different dimensions - one is 48*60, the other is 48*61.

I wonder if there are ways for a program to 'read' it both as A's. I have heard that this is something called computer vision(not so sure)... I would really prefer the method to be simple - it is not about identifying arbitrary characters; only 'A'. but if it can't be that way, any explanation to make the computer see both these as A's are really welcome.

Thanks in advance :)

First: character recognition not only isn't a simple problem, it's not a completely solved problem.

Are there many OCR implementations? Yes. Are those implementations good? It depends on the application. The more generalized you think OCR should be, the worse existing implementations look.

Long story short, there are books dedicated to this very subject, and it takes a book of some length to provide answers in any level of meaningful detail.

There are quite a few techniques for OCR (optical character recognition). Different techniques have been developed for (a) machine-printed characters versus (b) hand-written characters. Reading machine-printed characters is generally easier, but not necessarily easy. Reading handwritten characters can be very hard, and remains an incompletely solved problem. Keep in mind that there are other "scripts" (systems of characters for writing), and recognition techniques for Latin characters may be different than recognition techniques for traditional Chinese characters. [If you could write a mobile OCR application to read handwritten Chinese characters quickly and accurately, you could make a pile of money.]

https://en.wikipedia.org/wiki/Optical_character_recognition

There are quite a few approaches to OCR, and if you're interested in actually writing code to perform OCR than naturally you should consider implementing at least one of the simpler techniques first. From your comments it sounds like you're already looking into that, but briefly: do NOT look at neural networks first. Yes, you'll probably end up there, but there's much to learn about imaging, lighting, and basic image processing before you can put neural network techniques to much use.

But before you get into any deep, take some time to try to solve the problem yourself:

  1. Write code yourself (don't use someone else's code) to load an image from file into memory.
  2. Represent the image as a 2D array in memory.
  3. Think of ways you might distinguish just a few characters or shapes from one another. First assume those characters are perfectly reproduced. For example, if an image contains multiple exact copies of the characters "1" and "2," what is the simplest way you can imagine distinguishing those characters?
  4. Consider the same problem, but with characters that are only slightly different. For example, add a few "noise" pixels to each character.

After tinkering for a bit, read up on some basic image processing techniques. A good book is Digital Image Processing by Gonzalez and Woods.

(Normalized correlation is a simple algorithm you can read about online and in books. It's useful for certain simple types of OCR. You can think of normalized correlation as a method of comparing a "stencil" of a reference 'A' character to samples of other characters that may or may not be 'A' characters--the closer the stencil matches the sample, the higher the confidence the sample is an A.

So yes, try using OpenCV's template matching. First tinker with the OpenCV functions and learn when template matching works and when it fails, and then look more closely at the code.)

A recent survey of OCR techniques can be found in this book: Character Recognition Systems by Cheriet. It's a good starting point to investigate various algorithms. Some of the techniques will be quite surprising and counter-intuitive.

To learn more about how humans recognize characters--the details of which are often surprising and counter-intuitive--read the book Reading in the Brain by Dehaene. This book is quite readable and requires no special math or programming skills.

Finally, for any OCR algorithm it's important to keep the following in mind:

  1. Image quality is important. Control image acquisition and lighting as best you can. Develop a good gut feeling for the effects of light, shadow, etc., on OCR results.
  2. Set a goal for read rate accuracy. To avoid frustration, set a LOW goal at first--perhaps just 50%. There are various techniques for calculating what "accurate" means, but to start you can simply calculate the percentage of characters correctly identified or the percentage of words correctly identified. Achieving a read rate of 98% is not easy, and for some applications even that read rate is not particularly useful.
  3. Recognizing words adds another layer of complexity.
  4. It takes a long time to learn OCR in any depth. Take your time.
  5. Always revisit assumptions about how OCR algorithms "should" be written. Even if an implementation is clever in steps 2, 3, 4, and 5, a bone-headed choice for step 1 will hobble the overall implementation.

Good luck!

Your problem looks like optical character recognition. A very common approach for this is the use of a neural network. The neural network will analyse the image and give you probabilities for each letter. But you have to train it first, and neural networks are a subject of active research, so there is not a simple "drop-in" solution I know of.

Ok it is true that there is no simple "drop-in" for this problem. i'll try to explain the neural network method in a simple way to clear things up for you a little bit. First of all you need to represent the images in a simpler way! what that means is, right now your images are 48*60 matrices and are gray scale. consider taking the following actions:

  • turn them into binary photos.
  • resize them all into 50*50.
  • use morphological operations to thin the letters to one pixel width( Search it!).

Now we will use boxing method on the results. divide your 50*50 image into for example 8*8 grid sections. count how many pixels there are in each section and put the result in a 8*8 matrix name C . now you have a matrix C that is 8 by 8 and it is a simple representation of your original images. gather some training data and test data and simply use the Neural net pattern recognition app of matlab ( you do need to know how ANN works in order to use this app)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM