Pytesseract 无法识别图像中的数字

Question

Pytesseract fails to recognize digits 6 and 8 . Pytesseract 无法识别数字6和8 。 It recognizes它认

6 as 5 and 6作为5和
5 as 5 , 5作为5 ,
3 as 8 and 3作为8和
8 as 8 , 8作为8 ，
Oct as 0c: or 0:: and Oct为0c:或0::和
Wed as Men .作为Men Wed 。

The script used:使用的脚本：

config= "-c tessedit_char_whitelist=01234567890.:ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz -psm 3 -oem 0"
text = pytesseract.image_to_string(image, config=config)

Tried also using the different psm number from 1-12 but no luck.还尝试使用 1-12 的不同 psm 编号，但没有成功。 Increasing contrast results in more numbers not recognized:增加对比度会导致更多数字无法识别：

kernel = np.ones((2,2),np.uint8)
dilation = cv2.dilate(im, kernel)#,iterations = 1)
text = pytesseract.image_to_string(dilation, config=config)

Raw data:原始数据：

After running the script:运行脚本后：

After running new script:运行新脚本后：

Answer 1

Some preprocessing to clean/smooth the image before throwing it into Pytesseract can help.在将图像放入 Pytesseract 之前进行一些预处理以清理/平滑图像会有所帮助。 Specifically, morphological operations to close small holes and remove noise can enhance the image.具体来说，关闭小孔和去除噪声的形态学操作可以增强图像。 Also applying sharpening filters may help as well.应用锐化过滤器也可能有所帮助。 Also adjusting the kernel size or type may help.调整内核大小或类型也可能有所帮助。 I believe --psm 6 is the best here since the image is a single uniform block of text.我相信--psm 6在这里是最好的，因为图像是一个统一的文本块。 Here's what I get after a simple morph close这是我在简单的变形关闭后得到的

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('1.png',0)
thresh = cv2.threshold(image, 150, 255, cv2.THRESH_BINARY_INV)[1]

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2,2))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
result = 255 - close

data = pytesseract.image_to_string(result, lang='eng',config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.imshow('result', result)
cv2.imshow('close', close)
cv2.waitKey()

Pytesseract 无法识别图像中的数字

问题描述

1 个解决方案

解决方案1
2 2019-09-05 02:13:01

Pytesseract 无法识别图像中的数字

问题描述

1 个解决方案

解决方案1 2 2019-09-05 02:13:01

解决方案1
2 2019-09-05 02:13:01