简体   繁体   English

OpenCV tesseract 未检测到图像中的单个数字

[英]OpenCV tesseract not detect single digit number in an image

在此处输入图像描述

I am using tesseract with python.我正在使用带有 python 的 tesseract。 It recognizes almost all of my images with 2 or more numbers or characters它几乎可以识别我所有带有 2 个或更多数字或字符的图像

I don't want to train tesseract with "only digits" because I am recognizing characters too.我不想用“只有数字”来训练 tesseract,因为我也在识别字符。

But the attached image is not recognized from tessearact但是无法从 tessearact 识别附加的图像

I think the problem is caused by that bold border.我认为问题是由粗边框引起的。 After removing that, the digit got recognized correctly.删除后,数字被正确识别。

Above is the corrected image:上图是修正后的图像: 在此处输入图像描述

And here's the code if you are interested:如果您有兴趣,这是代码:

import cv2
import numpy as np
import pytesseract


def discard(image):
    image = np.uint8(image)
    _, im_label, stts, _ = cv2.connectedComponentsWithStats(image, connectivity=4)

    msk1 = np.isin(im_label, np.where(stts[:, cv2.CC_STAT_WIDTH] > 100)[0])
    msk2 = np.isin(im_label, np.where(stts[:, cv2.CC_STAT_HEIGHT] > 100)[0])

    image[(msk1 | msk2)] = 0
    return image


img = cv2.imread("check_img.jpg", 0)

# Binarization
thresh = 255 - img
ret, thresh = cv2.threshold(thresh, 5, 255, cv2.THRESH_BINARY)

# removing long connected-components
thresh = discard(thresh)

# remove noise
thresh = cv2.medianBlur(thresh, 3)

# invert again 
thresh = 255 - thresh

# showing the image
cv2.imshow("img", thresh)

# Using Tesseract OCR
custom_config = r'--oem 3 --psm 6'
text = pytesseract.image_to_string(thresh, config=custom_config)

print(text)

cv2.waitKey(0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM