简体   繁体   中英

Tesseract not recognizing digits correctly from OpenCV preprocessing

在此处输入图像描述

I'm trying to OCR these digits. However, tesseract is not recognizing them properly.

import cv2 from pytesseract import image_to_string

image = cv2.imread('PATHTOIMAGE', cv2.IMREAD_COLOR)
image = cv2.resize(image, None, fx=5, fy=5, interpolation=cv2.INTER_CUBIC)
gaussian = cv2.GaussianBlur(image, (5, 5), 2)
mask = cv2.inRange(gaussian, (250, 250, 250), (255, 255, 255))
ocr = image_to_string(mask, config='-c tessedit_char_whitelist=0123456789')
print(ocr)

The masking result is the following: 在此处输入图像描述

OCR result: 88311

I tried performing some morphological operations from here (dilating and opening), but no luck.

I also tried to detect contours and detect digit by digit, but also no luck.

How else could I improve?

I was able to achieve correct results without the scaling and Gaussian blur steps. I also inverted the mask and used only --psm 6

image = cv2.imread('PATHTOIMAGE', cv2.IMREAD_COLOR)
mask = cv2.inRange(image, (250, 250, 250), (255, 255, 255))
ocr = image_to_string(~mask, config='--psm 6')
print(ocr)

For what it's worth, I cannot recreate the 2 vs 8 confusion error. I am running tesseract 4.1.1 and pytesseract 0.3.8 on windows.

If you still need to explore other preprocessing steps, consider running a sequence of erosion and dilation operations with asymmetric kernels. For example

kernel = np.ones((6,2), np.uint8)
img = cv2.erode(mask, kernel, 1)

kernel = np.ones((2,40), np.uint8)
img = cv2.dilate(img, kernel, 1)

kernel = np.ones((1,40), np.uint8)
img = cv2.erode(img, kernel, 1)

kernel = np.ones((4,2), np.uint8)
img = cv2.dilate(img, kernel, 1)

Numbers are for your scaled image. This appears to remove the vertical points on the 2 while retaining other relevant information. You will need to tune and ensure it works for all number options.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM