[英]Pytesseract : "TesseractNotFound Error: tesseract is not installed or it's not in your path", how do I fix this?
I'm trying to run a basic and very simple code in python.我正在尝试在 python 中运行一个基本且非常简单的代码。
from PIL import Image
import pytesseract
im = Image.open("sample1.jpg")
text = pytesseract.image_to_string(im, lang = 'eng')
print(text)
This is what it looks like, I have actually installed tesseract for windows through the installer.这就是它的样子,我实际上已经通过安装程序为 windows 安装了 tesseract。 I'm very new to Python, and I'm unsure how to proceed?
我对 Python 很陌生,我不确定如何继续?
Any guidance here would be very helpful.这里的任何指导都会非常有帮助。 I've tried restarting my Spyder application but to no avail.
我尝试重新启动我的 Spyder 应用程序,但无济于事。
I see steps are scattered in different answers.我看到步骤分散在不同的答案中。 Based on my recent experience with this pytesseract error on Windows, writing different steps in sequence to make it easier to resolve the error:
根据我最近在 Windows 上遇到此 pytesseract 错误的经验,按顺序编写不同的步骤以更容易解决错误:
1 . 1 . Install tesseract using windows installer available at: https://github.com/UB-Mannheim/tesseract/wiki
使用 Windows 安装程序安装 tesseract: https : //github.com/UB-Mannheim/tesseract/wiki
2 . 2 . Note the tesseract path from the installation.
请注意安装中的 tesseract 路径。 Default installation path at the time of this edit was:
C:\\Users\\USER\\AppData\\Local\\Tesseract-OCR
.此编辑时的默认安装路径为:
C:\\Users\\USER\\AppData\\Local\\Tesseract-OCR
。 It may change so please check the installation path.它可能会改变,所以请检查安装路径。
3 . 3 .
pip install pytesseract
4 . 4 . Set the tesseract path in the script before calling
image_to_string
:在调用
image_to_string
之前在脚本中设置 tesseract 路径:
pytesseract.pytesseract.tesseract_cmd = r'C:\\Users\\USER\\AppData\\Local\\Tesseract-OCR\\tesseract.exe'
sudo apt-get update
sudo apt-get install libleptonica-dev
sudo apt-get install tesseract-ocr tesseract-ocr-dev
sudo apt-get install libtesseract-dev
brew install tesseract
download binary from https://github.com/UB-Mannheim/tesseract/wiki .从https://github.com/UB-Mannheim/tesseract/wiki下载二进制文件。 then add
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
to your script.然后将
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
到您的脚本中。
pip install tesseract
pip install tesseract-ocr
references: https://pypi.org/project/pytesseract/ (INSTALLATION section) and https://github.com/tesseract-ocr/tesseract/wiki#installation参考资料: https : //pypi.org/project/pytesseract/ (安装部分)和https://github.com/tesseract-ocr/tesseract/wiki#installation
1 - You need to have Tesseract OCR installed on your computer. 1 - 您需要在您的计算机上安装 Tesseract OCR。
get it from here.
从这里得到它。 https://github.com/UB-Mannheim/tesseract/wiki
https://github.com/UB-Mannheim/tesseract/wiki
Download the suitable version.
下载合适的版本。
2 - Add Tesseract path to your System Environment. 2 - 将 Tesseract 路径添加到您的系统环境。 ie Edit system variables.
即编辑系统变量。
3 - Run pip install pytesseract
and pip install tesseract
3 - 运行
pip install pytesseract
和pip install tesseract
4 - Add this line to your python script every time 4 -每次都将此行添加到您的 python 脚本中
pytesseract.pytesseract.tesseract_cmd = 'C:/OCR/Tesseract-OCR/tesseract.exe' # your path may be different
5 - Run the code. 5 - 运行代码。
This error is because tesseract is not installed on your computer.此错误是因为您的计算机上未安装 tesseract。
If you are using Ubuntu install tesseract using following command:如果您使用的是 Ubuntu,请使用以下命令安装 tesseract:
sudo apt-get install tesseract-ocr
For mac:对于 Mac:
brew install tesseract
From https://pypi.org/project/pytesseract/ :从https://pypi.org/project/pytesseract/ :
pytesseract.pytesseract.tesseract_cmd = '<full_path_to_your_tesseract_executable>'
# Include the above line, if you don't have tesseract executable in your PATH
# Example tesseract_cmd: 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract'
In windows:在窗口中:
pip install tesseract
pip install tesseract-ocr
and check the file which is stored in your system usr/appdata/local/programs/site-pakages/python/python36/lib/pytesseract/pytesseract.py
file and compile the file并检查系统中存储的文件
usr/appdata/local/programs/site-pakages/python/python36/lib/pytesseract/pytesseract.py
文件并编译该文件
On Mac, you can install it like shown below.在 Mac 上,您可以如下所示安装它。 This works for me.
这对我有用。
brew install tesseract
you can install this package... https://github.com/UB-Mannheim/tesseract/wiki after that you should go this path C:\\Program Files (x86)\\Tesseract-OCR\\ tesseract.exe then run tesseract file.你可以安装这个包... https://github.com/UB-Mannheim/tesseract/wiki之后你应该去这个路径 C:\\Program Files (x86)\\Tesseract-OCR\\ tesseract.exe 然后运行 tesseract 文件. I think this will help you...
我想这会帮助你...
在 Windows 64 位上,只需将以下内容添加到 PATH 环境变量: "C:\\Program Files\\Tesseract-OCR"
,它就会工作。
我可以通过使用 pytesseract.py 文件中的 bin/tesseract 路径更新 tesseract_cmd 变量来解决它
I had the same issue on Windows.我在 Windows 上遇到了同样的问题。 I tried to update the environment variables for the path of tesseract which did not work.
我尝试更新 tesseract 路径的环境变量,但没有成功。
What worked for me was to modify the pytesseract.py which can be found at the path C:\\Program Files\\Python37\\Lib\\site-packages\\pytesseract
or usually in the C:\\Users\\YOUR USER\\APPDATA\\Python
对我有用的是修改可以在路径
C:\\Program Files\\Python37\\Lib\\site-packages\\pytesseract
或通常在C:\\Users\\YOUR USER\\APPDATA\\Python
C:\\Program Files\\Python37\\Lib\\site-packages\\pytesseract
I changed one line as per below:我按如下更改了一行:
#tesseract_cmd = 'tesseract'
#tesseract_cmd = 'C:\Program Files\Tesseract-OCR\\tesseract.exe'
Note I had to put an extra \\
before tesseract as Python was interpreting same as \\t
and you will get the below error message:注意我必须在 tesseract 之前添加一个额外的
\\
,因为 Python 的解释与\\t
相同,您将收到以下错误消息:
pytesseract.pytesseract.TesseractNotFoundError: C:\\Program Files\\Tesseract-OCR esseract.exe is not installed or it's not in your path
pytesseract.pytesseract.TesseractNotFoundError: C:\\Program Files\\Tesseract-OCR esseract.exe 未安装或不在您的路径中
Step 1:第1步:
Install tesseract on your system as per the OS.根据操作系统在您的系统上安装 tesseract。 Latest installers can be found at https://github.com/UB-Mannheim/tesseract/wiki
最新的安装程序可以在https://github.com/UB-Mannheim/tesseract/wiki找到
Step 2: Install the following dependency libraries using : pip install pytesseract pip install opencv-python pip install numpy第 2 步:使用 pip install pytesseract pip install opencv-python pip install numpy 安装以下依赖库
Step 3: Sample code第 3 步:示例代码
import cv2
import numpy as np
import pytesseract
from PIL import Image
from pytesseract import image_to_string
# Path of working folder on Disk Replace with your working folder
src_path = "C:\\Users\\<user>\\PycharmProjects\\ImageToText\\input\\"
# If you don't have tesseract executable in your PATH, include the
following:
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-
OCR/tesseract'
TESSDATA_PREFIX = 'C:/Program Files (x86)/Tesseract-OCR'
def get_string(img_path):
# Read image with opencv
img = cv2.imread(img_path)
# Convert to gray
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply dilation and erosion to remove some noise
kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)
# Write image after removed noise
cv2.imwrite(src_path + "removed_noise.png", img)
# Apply threshold to get image with only black and white
#img = cv2.adaptiveThreshold(img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)
# Write the image after apply opencv to do some ...
cv2.imwrite(src_path + "thres.png", img)
# Recognize text with tesseract for python
result = pytesseract.image_to_string(Image.open(src_path + "thres.png"))
# Remove template file
#os.remove(temp)
return result
print('--- Start recognize text from image ---')
print(get_string(src_path + "image.png") )
print("------ Done -------")
You would be needing to install tesseract.您将需要安装tesseract。
https://github.com/tesseract-ocr/tesseract/wiki
https://github.com/tesseract-ocr/tesseract/wiki
Check out the above documentation on the installation.查看以上有关安装的文档。
In windows, the command path must be redirected, for a default windows tesseract installation.在 Windows 中,必须重定向命令路径,以进行默认的 Windows tesseract 安装。
pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'
pytesseract.pytesseract.tesseract_cmd = 'C:\Program Files\Tesseract-OCR\tesseract.exe'
Perhaps this is happening because, even if Tesseract is correctly installed, you have not installed your language, as was my case.也许这是因为,即使 Tesseract 安装正确,您也没有安装您的语言,就像我的情况一样。 Fortunately this is very easy to fix, and I did not even need to mess with
tesseract_cmd
.幸运的是,这很容易修复,我什至不需要弄乱
tesseract_cmd
。
sudo apt-get install tesseract-ocr -y
sudo apt-get install tesseract-ocr-spa -y
tesseract --list-langs
Note that in the second line we have specified -spa
for Spanish.请注意,在第二行中,我们为西班牙语指定了
-spa
。
If installation has been successful, you should get a list of your available languages, like:如果安装成功,您应该获得可用语言的列表,例如:
List of available languages (3):
eng
osd
spa
I found this at this blog post (Spanish).我在这篇博文(西班牙语)中找到了这个。 There is also a post for installation of Spanish language in Windows (not as easy apparently).
还有一个在 Windows 中安装西班牙语的帖子(显然不是那么容易)。
Note : since the question uses lang = 'eng'
, it is likely this is not the answer in that specific case.注意:由于该问题使用
lang = 'eng'
,因此这可能不是该特定情况下的答案。 But the same error may happen in this other situation, which is why I posted the answer here.但是在其他情况下可能会发生同样的错误,这就是我在这里发布答案的原因。
For Windows users only:仅适用于 Windows 用户:
Install tesseract using:使用以下命令安装 tesseract:
pip install tesseract
and then add this line to your code, mind the "\\"然后将此行添加到您的代码中,注意“\\”
pytesseract.pytesseract.tesseract_cmd = "C:\Program Files (x86)\Tesseract-OCR\\tesseract.exe"
仅通过使用conda
安装tesseract
就对我conda
。
conda install -c conda-forge tesseract
For Linux Distribution (Ubuntu)对于 Linux 发行版 (Ubuntu)
try尝试
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
使用以下命令安装tesseract
pip install tesseract
# {Windows 10 instructions}
# before you use the script you need to install the dependence
# 1. download the tesseract from the official link:
# https://github.com/UB-Mannheim/tesseract/wiki
# 2. install the tesseract
# i chosed this path
# *replace the user string in the below path with you name of user that you are using in your current machine
# C:\Users\user\AppData\Local\Tesseract-OCR\
# 3. Install the pillow for your python version
# * the best way for me is to install is this form(i'am using python3.7 version and in my CMD i run this version of python by typing py -3.7):
# * if you are using another version of python first look how you start the python from you CMD
# * for some machine the run of python from the CMD is different
# [examples]
# =================================
# PYTHON VERSION 3.7
# python
# python3.7
# python -3.7
# python 3.7
# python3
# python -3
# python 3
# py3.7
# py -3.7
# py 3.7
# py3
# py -3
# py 3
# PYTHON VERSION 3.6
# python
# python3.6
# python -3.6
# python 3.6
# python3
# python -3
# python 3
# py3.6
# py -3.6
# py 3.6
# py3
# py -3
# py 3
# PYTHON VERSION 2.7
# python
# python2.7
# python -2.7
# python 2.7
# python2
# python -2
# python 2
# py2.7
# py -2.7
# py 2.7
# py2
# py -2
# py 2
# ================================
# we are using pip to install the dependences
# because for me i start the python version 3.7 with the following line
# py -3.7
# open the CMD in windows machine and type the following line:
# py -3.7 -m pip install pillow
# 4. Install the pytesseract and tesseract for your python version
# * the best way for me is to install is this form(i'am using python3.7 version and in my CMD i run this version of python by typing py -3.7):
# we are using pip to install the dependences
# open the CMD in windows machine and type the following lines:
# py -3.7 -m pip install pytesseract
# py -3.7 -m pip install tesseract
#!/usr/bin/python
from PIL import Image
import pytesseract
import os
import getpass
def extract_text_from_image(image_file_name_arg):
# IMPORTANT
# if you have followed my instructions to install this dependence in above text explanatin
# for my machine is
# if you don't put the right path for tesseract.exe the script will not work
username = getpass.getuser()
# here above line get the username for your machine automatically
tesseract_exe_path_installation="C:\\Users\\"+username+"\\AppData\\Local\\Tesseract-OCR\\tesseract.exe"
pytesseract.pytesseract.tesseract_cmd=tesseract_exe_path_installation
# specify the direction of your image files manually or use line bellow if the images are in the script directory in folder images
# image_dir="D:\\GIT\\ai_example\\extract_text_from_image\\images"
image_dir=os.getcwd()+"\\images"
dir_seperator="\\"
image_file_name=image_file_name_arg
# if your image are in different format change the extension(ex. ".png")
image_ext=".jpg"
image_path_dir=image_dir+dir_seperator+image_file_name+image_ext
print("=============================================================================")
print("image used is in the following path dir:")
print("\t"+image_path_dir)
print("=============================================================================")
img=Image.open(image_path_dir)
text=pytesseract.image_to_string(img, lang="eng")
print(text)
# change the name "image_1" whith the name without extension for your image name
# image_file_name_arg="image_1"
image_file_name_arg="image_2"
# image_file_name_arg="image_3"
# image_file_name_arg="image_4"
# image_file_name_arg="image_5"
extract_text_from_image(image_file_name_arg)
# ==================================
# CREATED BY: SHERIFI
# e-mail: sherif_co@yahoo.com
# git-link for script: https://github.com/sherifi/ai_example.git
# ==================================
For Ubuntu 18.04
If you are getting an error like如果您收到类似的错误
tesseract is not installed or it's not in your path
and
OSError: [Errno 12] Cannot allocate memory
That might be and issue with the swap memory allocation issue这可能是交换内存分配问题
You can check this answer allocating more swap memory Hope that helps :)您可以检查此答案分配更多交换内存希望有帮助:)
https://askubuntu.com/questions/920595/fallocate-fallocate-failed-text-file-busy-in-ubuntu-17-04?answertab=active#tab-top https://askubuntu.com/questions/920595/fallocate-fallocate-failed-text-file-busy-in-ubuntu-17-04?answertab=active#tab-top
There are already many nice answers to this problem but I would like to share a wonderful site that I came across when I couldnt solve the 'TesseractNotFound Error: tesseract is not installed or it's not in your path” Please refer this site: https://www.thetopsites.net/article/50655738.shtml这个问题已经有很多不错的答案,但我想分享一个很棒的网站,当我无法解决“TesseractNotFound 错误:tesseract 未安装或不在您的路径中”时,我想分享一个很棒的网站,请参阅此网站: https:/ /www.thetopsites.net/article/50655738.shtml
I realised that I got this error because I installed pytesseract with pip but forget to install the binary.我意识到我收到这个错误是因为我用 pip 安装了pytesseract但忘记安装二进制文件。 You are probably missing tesseract-ocr from your machine.
您的机器上可能缺少 tesseract-ocr。 Check the installation instructions here: https://github.com/tesseract-ocr/tesseract/wiki
在此处查看安装说明: https : //github.com/tesseract-ocr/tesseract/wiki
On a Mac , you can just install using homebrew:在 Mac 上,您可以使用自制软件进行安装:
brew install tesseract
It should run fine after that!之后应该可以正常运行!
Under Windows 10 OS environment , the following method works for me:在 Windows 10 操作系统环境下,以下方法对我有用:
Go to this link and Download tesseract and install it.转到此链接并下载 tesseract 并安装它。 Windows version is available here: https://github.com/UB-Mannheim/tesseract/wiki
Windows 版本可在此处获得: https : //github.com/UB-Mannheim/tesseract/wiki
Find script file pytesseract.py from C:\\Users\\User\\Anaconda3\\Lib\\site-packages\\pytesseract and open it.从 C:\\Users\\User\\Anaconda3\\Lib\\site-packages\\pytesseract 中找到脚本文件 pytesseract.py 并打开它。 Change the following code from tesseract_cmd = 'tesseract' to: tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe' (This is the path where you install Tesseract-OCR so please check where you install it and accordingly update the path)
将以下代码从tesseract_cmd = 'tesseract'改为: tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe' (这是安装 Tesseract-OCR 的路径,请检查安装位置并相应地更新路径)
You may also need to add environment variable C:/Program Files (x86)/Tesseract-OCR/您可能还需要添加环境变量 C:/Program Files (x86)/Tesseract-OCR/
Hope it works for you!希望这对你有用!
Solution for UBUNTU Worked for me: UBUNTU 的解决方案对我有用:
Installed tesseract in ubuntu by following below link通过以下链接在 ubuntu 中安装了 tesseract
https://medium.com/quantrium-tech/installing-tesseract-4-on-ubuntu-18-04-b6fcd0cbd78f https://medium.com/quantrium-tech/installing-tesseract-4-on-ubuntu-18-04-b6fcd0cbd78f
Later added traindata language to tessdata by following below link后来通过以下链接将 traindata 语言添加到 tessdata
There looks to be an issue with the latest version of the pip module pytesseract=0.3.7.最新版本的 pip 模块 pytesseract=0.3.7 似乎存在问题。 I have downgraded it to pytesseract=0.3.6 and don't see the error.
我已经将它降级为 pytesseract=0.3.6 并且没有看到错误。
For Windows in simple steps:对于 Windows,只需简单的步骤:
Download Windows version from https://github.com/UB-Mannheim/tesseract/wiki从https://github.com/UB-Mannheim/tesseract/wiki下载 Windows 版本
Install安装
Write following in your .py file (check installed location)在您的 .py 文件中写入以下内容(检查安装位置)
pytesseract.pytesseract.tesseract_cmd = r"C:\\Program Files\\Tesseract-OCR\\tesseract.exe" img_text = pytesseract.image_to_string(Image.open(filename))
for me it worked by putting single quote对我来说,它通过放置单引号起作用
pytesseract.pytesseract.tesseract_cmd =r'C:/Program Files/Tesseract-OCR/tesseract.exe'
actually putting inside double quotes was automatically inserting unwanted chracter实际上放在双引号内是自动插入不需要的字符
The above tips did not help me fix the problem, because the error specified in the section occurred when installing pytesseract (pycharm, python 2.7).上面的提示并没有帮我解决问题,因为安装pytesseract(pycharm,python 2.7)时出现了小节指定的错误。 The oddity was also that tesseract worked from the command line, so the installation was done correctly.
奇怪的是 tesseract 也是从命令行工作的,所以安装是正确的。
I was able to fix this problem by following these steps:我可以按照以下步骤解决这个问题:
Subsequently, the image-to-text translation function worked in python 2.7随后,图像到文本的翻译功能在python 2.7中工作
Anaconda Installation:蟒蛇安装:
Works on Mac, Linux, and Windows适用于 Mac、Linux 和 Windows
conda-forge/packages/tesseract 4.1.1 conda-forge/包/tesseract 4.1.1
Step 1:第1步:
conda install -c conda-forge tesseract
Step 2: Find Tesseract PATH if you haven't already第 2 步:如果您还没有,请查找 Tesseract PATH
for r,s,f in os.walk("/"):
for i in f:
if "tesseract" in i:
print(os.path.join(r,i))
For example, my Tesseract PATH is /anaconda/bin/tesseract例如,我的 Tesseract PATH 是 /anaconda/bin/tesseract
Step 3: Add tesseract to PATH第 3 步:将 tesseract 添加到 PATH
pytesseract.pytesseract.tesseract_cmd = r'/anaconda/bin/tesseract'
I aleady tried this one on my raspberry pi.我已经在我的树莓派上试过这个了。 I just changed the path from this:
我只是从这里改变了路径:
C:/Program Files/Tesseract-OCR/tesseract.exe'
(Since, it is for windows) To this: (因为它适用于 Windows)对此:
/usr/local/lib/python3.7/dist-packages
Since, it is the path I see whenever I try to run this command:因为,这是我每次尝试运行此命令时看到的路径:
pip3 show pytesseract
For better clarity here's the message.为了更清楚,这里是消息。 Command line here
命令行在这里
I am also facing an same error while installing tesseract.我在安装 tesseract 时也面临同样的错误。
Based my recent problem solving i am following thsese below steps根据我最近解决的问题,我正在按照以下步骤进行操作
Install tesseract using windows installer available in the gievn link: https://github.com/UB-Mannheim/tesseract/wiki使用 gievn 链接中提供的 Windows 安装程序安装 tesseract: https ://github.com/UB-Mannheim/tesseract/wiki
Note the tesseract path from the installation.请注意安装中的 tesseract 路径。 Default installation path at the time of this edit was: C:\\Users\\USER\\AppData\\Local\\Tesseract-OCR.
此编辑时的默认安装路径为:C:\\Users\\USER\\AppData\\Local\\Tesseract-OCR。 It may change so please check the installation path.
它可能会改变,所以请检查安装路径。
After installations, still it is showing error or not installing error you are facing then press windows + R keys and run your file path (C:\\Program Files\\Tesseract-OCR\\tesseract.exe) it wil work for me,安装后,它仍然显示错误或未安装您面临的错误然后按 windows + R 键并运行您的文件路径(C:\\Program Files\\Tesseract-OCR\\tesseract.exe)它会为我工作,
3. pip install pytesseract
For windows file path -对于 Windows 文件路径 -
pytesseract.pytesseract.tesseract_cmd=r'C:\Program Files(x86)\Tesseract-OCR\tesseract.exe'
For linux installations will vary, but linux file path was given below对于 linux 安装会有所不同,但下面给出了 linux 文件路径
pytesseract.pytesseract.tesseract_cmd = r'home/user/bin/tesseract'
!sudo apt install tesseract-ocr
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
this helped in my case这对我有帮助
我推荐每个人都看这个家伙的视频,他很棒,没有一个能解决我的问题,但是这个,链接https://youtu.be/R4zK1-1lgCQ
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.