简体   繁体   English

无法安装tesseract-ocr软件包-``在/ tmp / pip_build_root / tesseract-ocr中编译失败,错误代码为1''

[英]Trouble installing tesseract-ocr package - ''compile failed with error code 1 in /tmp/pip_build_root/tesseract-ocr''

Trying to install tesseract-ocr package for use with pytesseract, running into an odd issue. 试图安装与pytesseract一起使用的tesseract-ocr软件包,遇到了一个奇怪的问题。 Installing everything else with pip worked, but when I tried sudo pip install tesseract-ocr as instructed here , I get the following errors: 使用pip安装其他所有东西都可以,但是当我按照此处的说明尝试sudo pip install tesseract-ocr ,出现以下错误:

Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip_build_root/tesseract-ocr/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-zsaPkE-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip_build_root/tesseract-ocr
Traceback (most recent call last):
  File "/usr/bin/pip", line 9, in <module>
    load_entry_point('pip==1.5.4', 'console_scripts', 'pip')()
  File "/usr/lib/python2.7/dist-packages/pip/__init__.py", line 235, in main
    return command.main(cmd_args)
  File "/usr/lib/python2.7/dist-packages/pip/basecommand.py", line 161, in main
    text = '\n'.join(complete_log)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 42: ordinal not in range(128)

I have a feeling that the traceback is causing the UnicodeDecodeError. 我感觉是回溯导致了UnicodeDecodeError。 Does anyone have any ideas on how to resolve this? 有人对如何解决这个问题有任何想法吗?

The link provided only mentions the use of Pip for installing pytesseract not Tesseract-OCR. 提供的链接仅提及使用Pip来安装pytesseract,而不是Tesseract-OCR。

As mentioned you will also need the Python Imaging Library (PIL), if it is not installed in your system you can use Pillow by using sudo pip install pillow . 如前所述,您还将需要Python Imaging Library(PIL),如果未在系统中安装它,则可以使用sudo pip install pillow来使用Pillow。

Tesseract-OCR is not installed with Pip using sudo pip install tesseract-ocr since it is not a Python module like pytesseract. Tesseract-OCR未使用sudo pip install tesseract-ocr与Pip一起sudo pip install tesseract-ocr因为它不是pytesseract之类的Python模块。 From what I see Tesseract-OCR is written mostly in C++. 从我看来,Tesseract-OCR主要是用C ++编写的。

The link given, http://code.google.com/p/tesseract-ocr/ , is no longer hosting Tesseract-OCR as the project has been moved to https://github.com/tesseract-ocr/tesseract . 由于该项目已移至https://github.com/tesseract-ocr/tesseract ,因此给出的链接http://code.google.com/p/tesseract-ocr/不再托管Tesseract-OCR。

Install instructions can be found on https://github.com/tesseract-ocr/tesseract/wiki . 可以在https://github.com/tesseract-ocr/tesseract/wiki上找到安装说明。

For Linux use, sudo apt-get install tesseract-ocr or sudo apt-get install tesseract-ocr-all to install all languages. 对于Linux使用, sudo apt-get install tesseract-ocr使用sudo apt-get install tesseract-ocrsudo apt-get install tesseract-ocr-all来安装所有语言。

For Mac use, brew install tesseract or brew install tesseract --all-languages to install all languages. 对于Mac使用, brew install tesseractbrew install tesseract --all-languages安装所有语言。 You will need Homebrew installed, it can be found at https://brew.sh . 您将需要安装Homebrew,可以在https://brew.sh上找到它。

For Windows, installer can be found on https://github.com/tesseract-ocr/tesseract/wiki/Downloads/ . 对于Windows,可以在https://github.com/tesseract-ocr/tesseract/wiki/Downloads/上找到安装程序。 Current stable version should comes with all languages included. 当前的稳定版本应包含所有语言。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM