简体   繁体   中英

pytesseract can't recognise digits from a image,

The image I'm trying to analyze is the following:

在此处输入图像描述

I'm running this code:

from PIL import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract'

my_image = 'C:\\autobot_wwe_supercard\\imagenes\\codigo_arriba.png'
text = pytesseract.image_to_string(Image.open(my_image))

print(text)

The result that is giving me is:

在此处输入图像描述

I have installed pytesseract by console with pip install pytesseract.

>>> img = cv2.imread("1299.png")
>>> gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
>>> thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1]
>>> thresh = 255 - thresh
>>> data = pytesseract.image_to_string(thresh, config='--psm 11 digits')
>>> data
'1299'
>>>

Try whitelisting digits in the configuration. pytesseract is capable of extracting white text on black background too sometimes.

pytesseract is not the best choice. Try to put some padding around text when you crop the region of interest.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM