简体   繁体   中英

Problems using Tesseract-OCR on Python

I'm new in programming and I'm trying to use Tesseract OCR to read the text of an image, but I can't make it work! I installed tesseract_OCR, pytesseract and pillow in my environment. Does anyone have a tip?

Input:

from PIL import Image 

import pytesseract

print( pytesseract.image_to_string( Image.open('phrase.jpg') ) ) 

Output:

 C:\Anaconda2\envs\ambiente36\python.exe 

 C:/Users/Simone/Desktop/curso_programacao/Ler_imagens/ler_imagens

Traceback (most recent call last):

File "C:\Anaconda2\envs\ambiente36\lib\site- 
packages\pytesseract\pytesseract.py", line 194, in run_and_get_output
run_tesseract(**kwargs)

File "C:\Anaconda2\envs\ambiente36\lib\site- 
packages\pytesseract\pytesseract.py", line 165, in run_tesseract
proc = subprocess.Popen(command, **subprocess_args())

File "C:\Anaconda2\envs\ambiente36\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)

File "C:\Anaconda2\envs\ambiente36\lib\subprocess.py", line 997, in 
_execute_child 
startupinfo)

FileNotFoundError: [WinError 2] O sistema não pode encontrar o arquivo 
especificado

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/Simone/Desktop/curso_programacao/Ler_imagens/ler_imagens", 
line 6, in <module>
phrase = pytesseract.image_to_string(Image.open('phrase.jpg'))

File "C:\Anaconda2\envs\ambiente36\lib\site- 
packages\pytesseract\pytesseract.py", line 286, in image_to_string
return run_and_get_output(image, 'txt', lang, config, nice)

File "C:\Anaconda2\envs\ambiente36\lib\site- 
packages\pytesseract\pytesseract.py", line 201, in run_and_get_output
raise TesseractNotFoundError()

pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed 
or it's not in your path

Steps you should follow to configure tessaract in your environment here are the steps you should follow

first install python and pip here are steps then install pillow, pytesseract as here

from PIL import Image
from pytesser.pytesser import *

image_file = "FULL/PATH/TO/YOUR/IMAGE/image.png"
im = Image.open(image_file)
text = image_to_string(im)
text = image_file_to_string(image_file)
text = image_file_to_string(image_file, graceful_errors=True)
print "=====output=======\n"
print text

link for download pytessaract you can find a complete example here

It seems like either Tesseract is not installed correctly or the path to tesseract does not point where tesseract was actually installed.

pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path

I suggest that you check your installations first by following the official documentation .

I've recently written a pretty simple guide to Tesseract but it should enable you to write your first OCR script and clear up some hurdles that I experienced when things were less clear than I would have liked in the documentation.

In case you'd like to check them out, here I'm sharing the links with you:

You need to install tesseract using windows installer available here . Then you should install the python wrapper as:

pip install pytesseract

Then you should also set the tesseract path in your script after importing pytesseract library as below (Please do not forget that installation path might be modified in your case!):

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

Note: It is tested on Anaconda3 without any issues.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM