OpenCV :Text Processing and Noise Removal

Question

I would like to remove the background of an image that contains text to make it text on white background.

sample of image

I have tried till now to get HSV of an image and upper and lower boundaries but I can't find upper and lower boundaries that can remove all the background effect

Code used till now :

import cv2
import numpy as np


# Take each frame
filename = 'img2.png'

img = cv2.imread(filename, 1)

# Convert BGR to HSV
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# define range of blue color in HSV
lower_blue = np.array([110,50,50])
upper_blue = np.array([130,255,255])
# Threshold the HSV image to get only blue colors
image_final = cv2.inRange(hsv, lower_blue, upper_blue)
# Bitwise-AND mask and original image
res = cv2.bitwise_and(img,img, mask= mask)
cv2.imshow('frame',img)
cv2.imwrite('mask.png',image_final)


cv2.waitKey(0)

Is there a better way to it or do I have to combine multiple lower and upper boundaries to reach my goal?

Answer 1

You could read the image as grayscale and set a treshold:

import cv2

img = cv2.imread('img2.png', 0)     # 0 means grayscale
new_img = (img >= 230)*255          # 230 is the threshold, change as desired
cv2.imwrite('mask.png',new_img)

This transforms the left pic into the right:

Since your pictures all have pure white letters, you can probably just choose a constant threshold that's quite high (as 0 would be black and 255 white), eg 230.

EDIT

@Ishara Madhawa had a very nice idea of using kernels to get rid of the central stripes. However, if you use cv2.morphologyEx instead, you do not change the thickness of the letters:

import cv2

img = cv2.imread('img2.png', 0)
new_img = ((img >= 230)*255).astype('uint8')
cv2.imwrite('mask.png',255-new_img)    # 255-... to get black on white

kernel = np.ones((5, 1), np.uint8)    
new_img2 = cv2.morphologyEx(new_img, cv2.MORPH_CLOSE, kernel)
cv2.imwrite('mask2.png',255-new_img2)

Answer 2

This solution will solve your problem.

This is the full code for the solution:

import cv2
import numpy as np
image = cv2.imread('input.png',0)

retval, thresh_gray = cv2.threshold(image, thresh=200, maxval=255,type=cv2.THRESH_BINARY_INV)

cv2.bitwise_not(thresh_gray,thresh_gray)

kernel = np.ones((5, 1), np.uint8)
joined = cv2.dilate(thresh_gray, kernel, iterations=1)
cv2.imshow('joined', joined)
cv2.waitKey(0)

First you should read the image as grayscale.

image = cv2.imread('input.png',0)

output:

After that you should set a threshold value in order to get rid of background noise. In this case I have set a manual threshold (200) to get the most optimized result.

retval, thresh_gray = cv2.threshold(image, thresh=200, maxval=255,type=cv2.THRESH_BINARY_INV)

output:

Then, after performing bitwise_not (to swap black and white) You should use a 5 x 1 kernel to join separated characters from middle. Two horizontal lines in the middle of the character will disappear.

cv2.bitwise_not(thresh_gray,thresh_gray)
kernel = np.ones((5, 1), np.uint8)
joined = cv2.dilate(thresh_gray, kernel, iterations=1)

output:

OpenCV :Text Processing and Noise Removal

Question

2 answers

solution1
6 ACCPTED 2018-06-08 22:49:42

EDIT

solution2
3 2018-06-09 02:09:59

OpenCV :Text Processing and Noise Removal

Question

2 answers

solution1 6 ACCPTED 2018-06-08 22:49:42

EDIT

solution2 3 2018-06-09 02:09:59

solution1
6 ACCPTED 2018-06-08 22:49:42

solution2
3 2018-06-09 02:09:59