简体   繁体   中英

Using Python Tesseract to get text from image, but getting an error

I'm attempting to use to Python Tesseract to get text fron an image on my macos desktop and am running into an error that I cannot figure out. I'm running macos High Sierra 10.3.2

My directory is set to my desktop (where the image lives) and I already specified the path to my tesseract executable.

I'm running

print(pytesseract.image_to_string(Image.open('test.png')) 

and getting the following error:

File "/Users/name/anaconda2/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 140, in run_and_get_output
    run_tesseract(**kwargs)
  File "/Users/name/anaconda2/lib/python2.7/site-packages/pytesseract/pytesseract.py", line 116, in run_tesseract
    raise TesseractError(status_code, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, u'File "/var/folders/cp/dg2snlxn2631h8jx1bwb7jk80000gn/T/tess_cK4lka.PNG", line 1 SyntaxError: Non-ASCII character \'\\x89\' in file /var/folders/cp/dg2snlxn2631h8jx1bwb7jk80000gn/T/tess_cK4lka.PNG on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details')

Any idea what might be causing this and how to get around it? Would be happy to provide any clarifying details.

Thanks!

Seems like you are trying to render a Non-ASCII character. Try adding this to the top of your .py file to ensure UTF-8 encoding:

# -*- coding: utf-8 -*- 

As stated by the error message, see this for more details.

User the unidecode library

from unidecode import unidecode
    .
    .
    .
    print unidecode(pytesseract.image_to_string(Image.open('test.png')))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM