简体   繁体   English

Python:为Windows 7安装Tesseract

[英]Python: Install Tesseract for Windows 7

My objective is to use OCR in Python 2.7 using Tesseract on a Windows 7 machine, but I am running into issues as for the installation process. 我的目标是在Windows 7计算机上使用Tesseract在Python 2.7中使用OCR,但是我在安装过程中遇到了问题。 I tried following the instruction here but the link to "tesseract-core-yyyymmdd.exe" and "tesseract-langs-yyyymmdd.exe" do not exist anymore and I can't find these .exe elsewhere online. 我尝试按照此处的说明进行操作但是“ tesseract-core-yyyymmdd.exe”和“ tesseract-langs-yyyymmdd.exe”的链接不再存在,并且在其他地方也找不到这些.exe。 Here's what I have done so far: 到目前为止,这是我所做的:

  1. installed tesseract from its executable from official tesseract-ocr page. 从官方的tesseract-ocr页面可执行文件安装了tesseract。
  2. installed via pip packages "wand", "PIL", "pyocr". 通过pip软件包“ wand”,“ PIL”,“ pyocr”进行安装。

Now, if I do the following in Python: 现在,如果我在Python中执行以下操作:

from wand.image import Image from PIL import Image as PI import pyocr import pyocr.builders import io

No problem loading up these packages but pyocr.get_available_tools() gives me an empty list. 加载这些软件包没有问题,但是pyocr.get_available_tools()给了我一个空列表。 I am sure this has to do with the missing installation .exe files above. 我确定这与上面缺少的安装.exe文件有关。 Where can I find them? 在哪里可以找到它们? Is it something else that I am missing? 我还缺少其他东西吗?

I just tried to set up pytesseract and it works ! 我只是尝试设置pytesseract,它有效! I have windows 10 and python 2.7 installed. 我安装了Windows 10和python 2.7。

all you need to do : 您需要做的所有事情:

  1. Download Visual basic C++ from http://aka.ms/vcpython27 and install it (common installation step) http://aka.ms/vcpython27下载Visual Basic C ++并进行安装(常规安装步骤)
  2. Download tesseract from python via this link https://pypi.python.org/pypi/pytesseract 通过此链接从python下载tesseract https://pypi.python.org/pypi/pytesseract

  3. Unizip the file. Unizip文件。

  4. Go to the directory which contains the unizip file 转到包含unizip文件的目录

  5. Run this command " python setup.py install " 运行此命令“ python setup.py install”

  6. (Additional) to test if it's installed, go to your python shell and run this command " import pytesseract " (附加)测试是否已安装,请转到python shell并运行以下命令“ import pytesseract”

I hope it works !! 我希望它能工作! Note pytesseract is google based OCR, it works similarly to tesseract. 注意pytesseract是基于Google的OCR,其​​工作方式与tesseract类似。

Step [1] To install tesseract kindly visit 步骤[1]要安装tesseract,请访问

https://github.com/UB-Mannheim/tesseract/wiki https://github.com/UB-Mannheim/tesseract/wiki

The latest installers can be downloaded from here: eg , tesseract-ocr-setup-3.05.02-20180621.exe, tesseract-ocr-w32-setup-v4.0.0-beta.1.20180608.exe, tesseract-ocr-w64-setup-v4.0.0-beta.1.20180608.exe (64 bit) 可以从此处下载最新的安装程序: 例如 ,tesseract-ocr-setup-3.05.02-20180621.exe,tesseract-ocr-w32-setup-v4.0.0-beta.1.20180608.exe,tesseract-ocr-w64-setup -v4.0.0-beta.1.20180608.exe(64位)

Step [2] Download Microsoft Visual C++ Compiler for Python 2.7 from the link given below https://download.microsoft.com/download/7/9/6/796EF2E4-801B-4FC4-AB28-B59FBF6D907B/VCForPython27.msi 步骤[2]从下面给出的链接中下载适用于Python 2.7的Microsoft Visual C ++编译器https://download.microsoft.com/download/7/9/6/796EF2E4-801B-4FC4-AB28-B59FBF6D907B/VCForPython27.msi

Step [3] Install pytesseract for binding for tesseract using pip 步骤[3]安装pytesseract以使用pip绑定到tesseract

pip install pytesseract

Step [4] Furthermore you can install an image processing library in python, eg, pillow : 步骤[4]此外,您可以在python中安装图像处理库,例如枕头

pip install pillow

greetings!! 问候!! you are done!! 你完成了! :) :)

PIP is a package manager for Python packages PIP是Python软件包的软件包管理器

  1. Open cmd run pip search "pytesseract" , you can see latest version 打开cmd运行pip search "pytesseract" ,可以看到最新版本
  2. Run pip install pytesseract for latest version or pip install pytesseract==0.3.0 for version you want. 运行pip install pytesseract以获取最新版本,或者运行pip install pytesseract==0.3.0获得所需的版本。
  3. In windows python cmd run import pytesseract for sure installed was successful. 在Windows python cmd中运行import pytesseract以确保安装成功。

Install both and you are done 两者都安装完成

Binaries from: https://github.com/UB-Mannheim/tesseract/wiki 二进制文件来自: https : //github.com/UB-Mannheim/tesseract/wiki

Python Wrapper from here: https://pypi.python.org/pypi/pytesseract 来自这里的Python包装器: https : //pypi.python.org/pypi/pytesseract

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM