For my project i am writing an image pre processing library for scanned documents. As of now I am stuck with line removal feature.
Problem Description: A sample scanned form:
Name* : ______________________________
Age* : ______________________________
Email-ID: |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
Following are the further conditions: 以下是进一步的条件:
Now what I am trying to do is to detect these lines and remove them. And while doing so the hand written content should not be lost.
Solution so for: The current solution is implemented in Java.
Detected these lines by using a combination of canny/sobel edge detectors and a threshold filter(to make image bitonal). From the previous action I get a black and white array of pixels. Traverse the array and check whether lumanicity of that pixel falls below a specified bin value. And if I found 30 (minimum line length in pixels) such pixels, I remove them. I repeat the same for vertical lines but considering the fact there will be cuts due to horizontal line removal.
Although the solution seems to work. But there are problems like,
Kindly guide how to do it properly and efficiently. And if there is an opensource library then please direct.
Thanks
First, I want to mention that I know nothing about image processing in general, and about OCR in particular.
Still, a very simple heuristic comes to my mind:
The only problem I can see is, if somebody writes letters on a horizontal line, like so:
/\ ___
/ \ / \
|__| |___/
-|--|---|---|------------------
| | \__/
In that case the line would remain, but you have to handle this case anyhow.
As I mentioned, I'm by no means an image processing expert, but sometimes very simple tricks work.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.