Why does pytesseract not recognize correctly?

Question

Ok so I've been trying to change my image to whatever works, but I cannot seem to find the right settings..

This is the image:

As you can see picture is already as simple as anything, but it still cannot recognize '1 BB' from the Image.. Any tips?

img = Image.fromarray(img)
imp_arr = np.asarray(img)
imp_arr = (np.floor(imp_arr / 140.0) * 255.0).astype('uint8')
img = Image.fromarray(imp_arr, mode='L')
width, height = img.size 
img = img.resize((width*3, height*3), Image.BICUBIC)
width, height = img.size 
img = img.resize((width*2, height*2), Image.HAMMING)
width, height = img.size 
img = img.resize((int(width*0.3), int(height*0.3)), Image.BICUBIC)
img = ImageEnhance.Brightness(img).enhance(0.7)
img = ImageEnhance.Sharpness(img).enhance(2)
img = ImageEnhance.Contrast(img).enhance(2)
amount = pytesseract.image_to_string(img, config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

This is just an example, of what I've tried to adjust it correctly to get the correct text to string. Some of the times it works other times it prints out gibberish. The thing is.. It needs to work every single time, expecially for a picture as clear as this one. Is there a mastermind who has a simple solution to this problem? Thank you in advance.

Answer 1

After installing Tesseract OCR, Pillow and pytesseract, I saved your image as igor.png and ran the following code, which I found in the docs of pytesseract :

#!/usr/bin/env python

from PIL import Image
import pytesseract

print(pytesseract.image_to_string(Image.open("igor.png")))

It prints the expected result:

1BB

If I correct a bit your initial code by adding the letter B to the tessedit_char_whitelist , it works as well.

Why does pytesseract not recognize correctly?

Question

1 answers

solution1
1 2019-11-11 00:03:10

Why does pytesseract not recognize correctly?

Question

1 answers

solution1 1 2019-11-11 00:03:10

solution1
1 2019-11-11 00:03:10