[英]Trouble installing tesseract-ocr package - ''compile failed with error code 1 in /tmp/pip_build_root/tesseract-ocr''
Trying to install tesseract-ocr package for use with pytesseract, running into an odd issue. 试图安装与pytesseract一起使用的tesseract-ocr软件包,遇到了一个奇怪的问题。 Installing everything else with pip worked, but when I tried sudo pip install tesseract-ocr
as instructed here , I get the following errors: 使用pip安装其他所有东西都可以,但是当我按照此处的说明尝试sudo pip install tesseract-ocr
,出现以下错误:
Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip_build_root/tesseract-ocr/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-zsaPkE-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip_build_root/tesseract-ocr
Traceback (most recent call last):
File "/usr/bin/pip", line 9, in <module>
load_entry_point('pip==1.5.4', 'console_scripts', 'pip')()
File "/usr/lib/python2.7/dist-packages/pip/__init__.py", line 235, in main
return command.main(cmd_args)
File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 161, in main
text = '\n'.join(complete_log)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 42: ordinal not in range(128)
I have a feeling that the traceback is causing the UnicodeDecodeError. 我感觉是回溯导致了UnicodeDecodeError。 Does anyone have any ideas on how to resolve this? 有人对如何解决这个问题有任何想法吗?
The link provided only mentions the use of Pip for installing pytesseract not Tesseract-OCR. 提供的链接仅提及使用Pip来安装pytesseract,而不是Tesseract-OCR。
As mentioned you will also need the Python Imaging Library (PIL), if it is not installed in your system you can use Pillow by using sudo pip install pillow
. 如前所述,您还将需要Python Imaging Library(PIL),如果未在系统中安装它,则可以使用sudo pip install pillow
来使用Pillow。
Tesseract-OCR is not installed with Pip using sudo pip install tesseract-ocr
since it is not a Python module like pytesseract. Tesseract-OCR未使用sudo pip install tesseract-ocr
与Pip一起sudo pip install tesseract-ocr
因为它不是pytesseract之类的Python模块。 From what I see Tesseract-OCR is written mostly in C++. 从我看来,Tesseract-OCR主要是用C ++编写的。
The link given, http://code.google.com/p/tesseract-ocr/ , is no longer hosting Tesseract-OCR as the project has been moved to https://github.com/tesseract-ocr/tesseract . 由于该项目已移至https://github.com/tesseract-ocr/tesseract ,因此给出的链接http://code.google.com/p/tesseract-ocr/不再托管Tesseract-OCR。
Install instructions can be found on https://github.com/tesseract-ocr/tesseract/wiki . 可以在https://github.com/tesseract-ocr/tesseract/wiki上找到安装说明。
For Linux use, sudo apt-get install tesseract-ocr
or sudo apt-get install tesseract-ocr-all
to install all languages. 对于Linux使用, sudo apt-get install tesseract-ocr
使用sudo apt-get install tesseract-ocr
或sudo apt-get install tesseract-ocr-all
来安装所有语言。
For Mac use, brew install tesseract
or brew install tesseract --all-languages
to install all languages. 对于Mac使用, brew install tesseract
或brew install tesseract --all-languages
安装所有语言。 You will need Homebrew installed, it can be found at https://brew.sh . 您将需要安装Homebrew,可以在https://brew.sh上找到它。
For Windows, installer can be found on https://github.com/tesseract-ocr/tesseract/wiki/Downloads/ . 对于Windows,可以在https://github.com/tesseract-ocr/tesseract/wiki/Downloads/上找到安装程序。 Current stable version should comes with all languages included. 当前的稳定版本应包含所有语言。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.