Pytesseract fails to recognize '3'

Question

from PIL import Image
import pytesseract, time, PADBS
pytesseract.pytesseract.tesseract_cmd = r"C:/tesseract/Tesseract-OCR/tesseract.exe"

image = Image.open('3.png')
print(pytesseract.image_to_string(image))

Image with '3' Image with '10'

When trying to read '3.png' it ends without output. But when trying to read '10.png' it reads it succesfully. I have tried to run it on diffrent configs; --oem 3 -psm 13. And i tried --oem 1 to 3. But nothing worked. What could be the possible cause that it fails to recognize this number? And what can i change in the code to make this work?

Answer 1

I think you missed the page segmentation mode 6 :

6 Assume a single uniform block of text. Source

For the version 4.1.1 the result will be 3.

Code:

import cv2
import pytesseract

# Load the image
img = cv2.imread("3.png")

# Convert to the gray-scale
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# OCR
txt = pytesseract.image_to_string(gry, config="--psm 6")

# Print
print(pytesseract.get_tesseract_version())
print(txt)

# Display
cv2.imshow("", gry)
cv2.waitKey(0)

Result :

4.1.1
3

Pytesseract fails to recognize '3'

Question

1 answers

solution1
0 2021-05-20 20:14:20

Pytesseract fails to recognize '3'

Question

1 answers

solution1 0 2021-05-20 20:14:20

solution1
0 2021-05-20 20:14:20