简体   繁体   中英

How to determine if numbers in an image are same or different, using Optical Character Recognition?

If I have the following 4 images:

6

5

9

6

How can I determine that the two '6's are the same but 5 and 6, 6 and 9, 9 and 5, etc are not?

The images will always be monochrome (ie only black and white, no other colors)

At the moment, I'm simply counting the number of black pixels in the image, and that seems to work okay, but I'm not sure if its reliable or if there's a better method. In the above example, both '6's have 29 black pixels, while 5 has 26, and 9 has 28. So the difference between 6 and 9 is only 1 pixel.. however in other fonts, 9 and 6 have identical number of pixels. Eg:

6

6

have both got an identical number of foreground pixels.

Are you trying to detect exact-identicals, or detect near-identicals/ approximate matches (which is what real OCR is about)?

You may as well start by finding a weighted center of the image/glyph, perhaps scaling size for comparability (if you have to match at different sizes), and then comparing pixel-to-pixel similarity (as % similarity) between the two images.

Of course if the images are all cropped & sized for you then you just have to scan the images comparing all pixels, to achieve a brute-force "similarity" measure.

See BufferedImage.getRGB(): http://docs.oracle.com/javase/1.5.0/docs/api/java/awt/image/BufferedImage.html#getRGB(int,%20int)

You can write a function to take two RGB pixel values (as ints up to 0xffffff), separate the components, & sum the component differences.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM