Remove surrounding lines and background graphic noise from handwritten text

Question

I am trying to remove rules and a background smiley face from multiple notebook pages before performing text detection and recognition on the handwritten text.

An earlier thread offers helpful hints, but my problem is different in several respects.

The text to keep is written over the background items to be removed.
The items to be removed have distinct colors from that of the text, which may be the key to their removal.
The lines to be removed are not very straight, and the smiley face even less so.

I'm thinking of using OpenCV for this task, but I'm open to using ImageMagick or command-line GIMP so long as I can process the entire batch at once. Since I have never used any of these tools before, any advice would be welcome. Thank you.

Answer 1

Here's a simple approach with the assumption that the text is blue

Convert image to HSV format and color threshold with cv2.inRange()
Perform morphological transformations to smooth image
Isolate characters
Recolor characters for OCR/Tesseract

We begin by converting the image to HSV format and create a mask to isolate the characters

image = cv2.imread('1.png')
result = image.copy()
image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([21,0,0])
upper = np.array([179, 255, 209])
mask = cv2.inRange(image, lower, upper)

Now we perform morphological transformations to remove small noise

kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (2,2))
close = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=1)

We have the desired text outlines so we can isolate characters by masking with the original image

result[close==0] = (255,255,255)

Finally to prepare the image for OCR/Tesseract, we change the characters to black

retouch_mask = (result <= [250.,250.,250.]).all(axis=2)
result[retouch_mask] = [0,0,0]

Full code

import numpy as np
import cv2

image = cv2.imread('1.png')
result = image.copy()
image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
lower = np.array([21,0,0])
upper = np.array([179, 255, 209])
mask = cv2.inRange(image, lower, upper)

kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (2,2))
close = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=1)

result[close==0] = (255,255,255)

cv2.imshow('cleaned', result)

retouch_mask = (result <= [250.,250.,250.]).all(axis=2)
result[retouch_mask] = [0,0,0]

cv2.imshow('mask', mask)
cv2.imshow('close', close)
cv2.imshow('result', result)
cv2.waitKey()

Remove surrounding lines and background graphic noise from handwritten text

Question

1 answers

solution1
2 2019-08-15 01:00:42

Remove surrounding lines and background graphic noise from handwritten text

Question

1 answers

solution1 2 2019-08-15 01:00:42

solution1
2 2019-08-15 01:00:42