简体   繁体   English

Pytesseract 无法识别“3”

[英]Pytesseract fails to recognize '3'

from PIL import Image
import pytesseract, time, PADBS
pytesseract.pytesseract.tesseract_cmd = r"C:/tesseract/Tesseract-OCR/tesseract.exe"

image = Image.open('3.png')
print(pytesseract.image_to_string(image))

Image with '3' Image with '10'带有“3”的图像带有“10”的图像

When trying to read '3.png' it ends without output.当尝试读取“3.png”时,它没有 output 结束。 But when trying to read '10.png' it reads it succesfully.但是当尝试读取“10.png”时,它会成功读取它。 I have tried to run it on diffrent configs;我试图在不同的配置上运行它; --oem 3 -psm 13. And i tried --oem 1 to 3. But nothing worked. --oem 3 -psm 13。我尝试了 --oem 1 到 3。但没有任何效果。 What could be the possible cause that it fails to recognize this number?它无法识别此号码的可能原因是什么? And what can i change in the code to make this work?我可以在代码中进行哪些更改以使其正常工作?

I think you missed the page segmentation mode 6 :我想你错过了页面分割模式6

6 Assume a single uniform block of text. 6 假设一个统一的文本块。 Source 资源

For the version 4.1.1 the result will be 3.对于 4.1.1 版本,结果将为 3。

Code:代码:

import cv2
import pytesseract

# Load the image
img = cv2.imread("3.png")

# Convert to the gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# OCR
txt = pytesseract.image_to_string(gry, config="--psm 6")

# Print
print(pytesseract.get_tesseract_version())
print(txt)

# Display
cv2.imshow("", gry)
cv2.waitKey(0)

Result :结果

4.1.1
3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM