How do I isolate handwritten text from an image using OpenCV and Python?

Question

How do I isolate or crop only the handwritten text using OpenCV and Phyton for the image:

手写图像

I have tried to use:

cv2.findContours

but because of the noise (background and dirty in paper) I can't get only the paper.

How do I do this?

Answer 1

To smooth noisy images, typical methods are to apply some type of blurring filter. For instance cv2.GaussianBlur() , cv2.medianBlur() , or cv2.bilaterialFilter() can be used to remove salt/pepper noise. After blurring, we can threshold to obtain a binary image then perform morphological operations. From here, we can find contours and filter using aspect ratio or contour area. To crop the ROI, we can use Numpy slicing

Detected text

Extracted ROI

Code

import cv2

image = cv2.imread('1.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.medianBlur(gray, 5)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,11,8)

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
dilate = cv2.dilate(thresh, kernel, iterations=6)
cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)

for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    ROI = image[y:y+h, x:x+w]
    cv2.imwrite('ROI.png', ROI)
    break

cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('ROI', ROI)
cv2.waitKey()

Answer 2

Convert image to single channel gray.
Apply adaptiveThreshold to your image. Handwriting will become of black color, rest will be white.
If you want to get segmentation for this word as a solid thing, then also apply morphologyEx , with MORPH_CLOSE . Here you should play with kernel, most likely it will be ellipse 3x3, and number of iterations, usually 5-10 iterations is ok.

   kernel = cv2.getStructuringElement(shape=cv2.MORPH_ELLIPSE, ksize=(3, 3))
   image = cv2.morphologyEx(image, cv2.MORPH_CLOSE, kernel, iterations=7)

Use connectedComponentsWithStats . It will put every char into separate component. stats will hold bounding boxes either for whole word, or (if you omit step #2) it will hold info for each connected characters group.

PS: Let me know if you need full code example.

How do I isolate handwritten text from an image using OpenCV and Python?

Question

2 answers

solution1
2 ACCPTED 2019-09-27 23:06:18

solution2
1 2022-06-09 12:52:52

How do I isolate handwritten text from an image using OpenCV and Python?

Question

2 answers

solution1 2 ACCPTED 2019-09-27 23:06:18

solution2 1 2022-06-09 12:52:52

solution1
2 ACCPTED 2019-09-27 23:06:18

solution2
1 2022-06-09 12:52:52