简体   繁体   中英

OpenCV: Letters and words detection from edge detection image

I am currently dealing with text recognition. Here is a part of binarized image with edge detection (using Canny):

EDIT: I am posting a link to an image. I don't have 10 rep points so I cannot post an image.

EDIT 2: And here's the same piece after thresholding. Honestly, I don't know which approach would be better.

[ 2

The questions remain the same:

  1. How should I detect certain letters? I need to determine location of every letter and then every word.

  2. Is it a problem that some letters are "opened"? I mean that they are not closed areas.

  3. If I use cv::matchtemplate , does it mean that I need to have 24 templates for every letter + 10 for every digit? And then loop over my image to determine the best correlation?

  4. If both the letters and squares they are in, are 1-pixel wide, what filters / operations should I do to close the opened letters? I tried various combinations of dilate and erode - with no effect.

The question is kind of "how do I do OCR with Open CV?" and the answer is that it's an involved process and quite difficult.

But some pointers. Firstly, its hard to detect letters which are outlined. Most of the tools are designed for filled letters. But that image looks as if there will only be one non-letter distractor if you fill all loops using a certain size threshold. You can get rid of the non-letter lines because they are a huge connected object.

Once you've filled the letters, they can be skeletonised.

You can't use morphological operations like open and close very sensibly on images where the details are one pixel wide. You can put the image through the operation, but essentially there is no distinction between detail and noise if all features are one pixel. However once you fill the letters, that problem goes away.

This isn't in any way telling you how to do it, just giving some pointers.

As mentioned in the previous answer by malcolm OCR will work better on filled letters so you can do the following

1 use your second approach but take the inverse result and not the one you are showing. 2 run connected component labeling 3 for each component you can run the OCR algorithm

In order to discard outliers I will try to use the spatial relation between detected letters. They sold have other letter horizontally or vertically next to them.

Good luck

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM