简体   繁体   English

在 Python 上使用 Tesseract-OCR 的问题

[英]Problems using Tesseract-OCR on Python

I'm new in programming and I'm trying to use Tesseract OCR to read the text of an image, but I can't make it work!我是编程新手,我正在尝试使用 Tesseract OCR 来读取图像的文本,但我无法让它工作! I installed tesseract_OCR, pytesseract and pillow in my environment.我在我的环境中安装了 tesseract_OCR、pytesseract 和枕头。 Does anyone have a tip?有人有提示吗?

Input:输入:

from PIL import Image 

import pytesseract

print( pytesseract.image_to_string( Image.open('phrase.jpg') ) ) 

Output:输出:

 C:\Anaconda2\envs\ambiente36\python.exe 

 C:/Users/Simone/Desktop/curso_programacao/Ler_imagens/ler_imagens

Traceback (most recent call last):

File "C:\Anaconda2\envs\ambiente36\lib\site- 
packages\pytesseract\pytesseract.py", line 194, in run_and_get_output
run_tesseract(**kwargs)

File "C:\Anaconda2\envs\ambiente36\lib\site- 
packages\pytesseract\pytesseract.py", line 165, in run_tesseract
proc = subprocess.Popen(command, **subprocess_args())

File "C:\Anaconda2\envs\ambiente36\lib\subprocess.py", line 709, in __init__
restore_signals, start_new_session)

File "C:\Anaconda2\envs\ambiente36\lib\subprocess.py", line 997, in 
_execute_child 
startupinfo)

FileNotFoundError: [WinError 2] O sistema não pode encontrar o arquivo 
especificado

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:/Users/Simone/Desktop/curso_programacao/Ler_imagens/ler_imagens", 
line 6, in <module>
phrase = pytesseract.image_to_string(Image.open('phrase.jpg'))

File "C:\Anaconda2\envs\ambiente36\lib\site- 
packages\pytesseract\pytesseract.py", line 286, in image_to_string
return run_and_get_output(image, 'txt', lang, config, nice)

File "C:\Anaconda2\envs\ambiente36\lib\site- 
packages\pytesseract\pytesseract.py", line 201, in run_and_get_output
raise TesseractNotFoundError()

pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed 
or it's not in your path

Steps you should follow to configure tessaract in your environment here are the steps you should follow在您的环境中配置 tessaract 应该遵循的步骤是您应该遵循的步骤

first install python and pip here are steps then install pillow, pytesseract as here首先安装python和pip 这里是步骤然后安装枕头,pytesseract在这里

from PIL import Image
from pytesser.pytesser import *

image_file = "FULL/PATH/TO/YOUR/IMAGE/image.png"
im = Image.open(image_file)
text = image_to_string(im)
text = image_file_to_string(image_file)
text = image_file_to_string(image_file, graceful_errors=True)
print "=====output=======\n"
print text

link for download pytessaract you can find a complete example here 下载pytessaract 的链接,您可以在此处找到完整的示例

It seems like either Tesseract is not installed correctly or the path to tesseract does not point where tesseract was actually installed.似乎 Tesseract 没有正确安装,或者 tesseract 的路径没有指向 tesseract 实际安装的位置。

pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your path pytesseract.pytesseract.TesseractNotFoundError: tesseract 未安装或不在您的路径中

I suggest that you check your installations first by following the official documentation .我建议您首先按照官方文档检查您的安装。

I've recently written a pretty simple guide to Tesseract but it should enable you to write your first OCR script and clear up some hurdles that I experienced when things were less clear than I would have liked in the documentation.我最近写了一个非常简单的 Tesseract 指南,但它应该使您能够编写您的第一个 OCR 脚本并清除我在文档中不太清楚时遇到的一些障碍。

In case you'd like to check them out, here I'm sharing the links with you:如果您想查看它们,我在这里与您分享链接:

You need to install tesseract using windows installer available here .您需要使用此处提供的 Windows 安装程序安装 tesseract。 Then you should install the python wrapper as:然后你应该将 python 包装器安装为:

pip install pytesseract

Then you should also set the tesseract path in your script after importing pytesseract library as below (Please do not forget that installation path might be modified in your case!):然后您还应该在导入pytesseract库后在脚本中设置tesseract路径,如下所示(请不要忘记在您的情况下可能会修改安装路径!):

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

Note: It is tested on Anaconda3 without any issues.注意:它在 Anaconda3 上测试没有任何问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM