简体   繁体   中英

Unable to read image text with python tesseract and OpenCV

I am trying read text from this

这个图片

using Python with OpenCV. However, it is not able to read it.

import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
img=cv.imread(file_path,0)

img = cv.medianBlur(img,5)
ret,th1 = cv.threshold(img,127,255,cv.THRESH_BINARY)

th2 =cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_MEAN_C,\
    cv.THRESH_BINARY,11,2)

th3 = cv.adaptiveThreshold(img,255,cv.ADAPTIVE_THRESH_GAUSSIAN_C,\
    cv.THRESH_BINARY,11,2)

titles = ['Original Image', 'Global Thresholding (v = 127)',
    'Adaptive Mean Thresholding', 'Adaptive Gaussian Thresholding']

images = [img, th1, th2, th3]

for i in range(4):
    plt.subplot(2,2,i+1),plt.imshow(images[i],'gray')
    plt.title(titles[i])
    plt.xticks([]),plt.yticks([])

plt.show()

anyway to do this?

Instead of working on the grayscale image, working on saturation channel of the HSV color space makes the subsequent steps easier.

img = cv2.imread(image_path_to_captcha)
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
s_component = hsv[:,:,1]

s_component 在此处输入图像描述

Next, apply a Gaussian blur of appropriate kernel size and sigma value, and later threshold.

blur = cv2.GaussianBlur(s_component,(7,7), 7)
ret,th3 = cv2.threshold(blur,127,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

th3 在此处输入图像描述

Next, finding contours in the image above and preserving those above a certain area threshold in the black image variable which will be used as mask later on.

contours, hierarchy = cv2.findContours(th3, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
black = np.zeros((img.shape[0], img.shape[1]), np.uint8)

for contour in contours:
    if cv2.contourArea(contour) >600 :
        cv2.drawContours(black, [contour], 0, 255, -1)

black 在此处输入图像描述

Using the black image variable as mask over the threshold image

res = cv2.bitwise_and(th3, th3, mask = black)   

res 在此处输入图像描述

Finally, applying morphological thinning to the above result

kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
erode = cv2.erode(res, kernel, iterations=1)

erode 在此处输入图像描述

The end result is not what you expect. You can try experimenting different morphology operations prior to drawing contours as well.

EDIT

You can perform distance transform on the above image and use the result:

dist = cv2.distanceTransform(res, cv2.DIST_L2, 3)
dst = cv2.normalize(dist, dst=None, alpha=0, beta=255,norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8U)

dst 在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM