简体   繁体   中英

Why does pytesseract not recognize correctly?

Ok so I've been trying to change my image to whatever works, but I cannot seem to find the right settings..

This is the image: 在此处输入图像描述

As you can see picture is already as simple as anything, but it still cannot recognize '1 BB' from the Image.. Any tips?

img = Image.fromarray(img)
imp_arr = np.asarray(img)
imp_arr = (np.floor(imp_arr / 140.0) * 255.0).astype('uint8')
img = Image.fromarray(imp_arr, mode='L')
width, height = img.size 
img = img.resize((width*3, height*3), Image.BICUBIC)
width, height = img.size 
img = img.resize((width*2, height*2), Image.HAMMING)
width, height = img.size 
img = img.resize((int(width*0.3), int(height*0.3)), Image.BICUBIC)
img = ImageEnhance.Brightness(img).enhance(0.7)
img = ImageEnhance.Sharpness(img).enhance(2)
img = ImageEnhance.Contrast(img).enhance(2)
amount = pytesseract.image_to_string(img, config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

This is just an example, of what I've tried to adjust it correctly to get the correct text to string. Some of the times it works other times it prints out gibberish. The thing is.. It needs to work every single time, expecially for a picture as clear as this one. Is there a mastermind who has a simple solution to this problem? Thank you in advance.

After installing Tesseract OCR, Pillow and pytesseract, I saved your image as igor.png and ran the following code, which I found in the docs of pytesseract :

#!/usr/bin/env python

from PIL import Image
import pytesseract

print(pytesseract.image_to_string(Image.open("igor.png")))

It prints the expected result:

1BB

If I correct a bit your initial code by adding the letter B to the tessedit_char_whitelist , it works as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM