如何在没有Visual Studio的Windows上安装leptonica + tesseract在Anaconda中使用？

Question

I wanted to perform text recognition from images and I want to use Python. 我想从图像中执行文本识别，我想使用Python。 I installed Anaconda. 我安装了Anaconda。 Now I want to install Tesseract but I also need to install Leptonica. 现在我想安装Tesseract但我还需要安装Leptonica。 I did not find any clear instruction how to do it in windows. 我没有找到任何明确的说明如何在Windows中执行此操作。 For Leptonica I do not want to install Visual Studio. 对于Leptonica我不想安装Visual Studio。 So could anybody provide clear instructions how to install leptonica and tesseract on Windows without Visual Studio to use in anaconda ? 那么有人可以提供明确的说明如何在Windows上安装leptonica和tesseract而不使用Visual Studio在anaconda中使用吗？ Thanks. 谢谢。

Answer 1

Here is simple set of steps to have tesseract 3.05 dev version as of 04/22/2016 working both on windows 7 and windows 8 machines: 以下是在Windows 7和Windows 8机器上使用tesseract 3.05 dev版本的简单步骤：

1- install tesseract from its executable from official tesseract-ocr page (version 3.02 for windoes will suffice) 1-从官方tesseract-ocr页面安装tesseract来自其可执行文件（版本3.02 for windoes就足够了）

2- download the following two files for tesseract 3.05 dev version from http://domasofan.spdns.eu/tesseract/ 2-从http://domasofan.spdns.eu/tesseract/下载tesseract 3.05开发版的以下两个文件

There are 2 exe files: 有2个exe文件：

tesseract-core-yyyymmdd.exe Tesseract core application without language data tesseract-core-yyyymmdd.exe没有语言数据的Tesseract核心应用程序
tesseract-langs-yyyymmdd.exe All the language data available for Tesseract. tesseract-langs-yyyymmdd.exe Tesseract可以使用的所有语言数据。

(yyyymmdd means year 4 digits, month 2 digits and day 2 digits.) （yyyymmdd表示年份4位数，月份2位数字和第2位数字。）

The app is portable so you can install it on a USB stick or in another location. 该应用程序是便携式的，因此您可以将其安装在USB记忆棒或其他位置。

sub Steps to install these: sub安装这些的步骤：

Download the tesseract-core and tesseract-langs packages. 下载tesseract-core和tesseract-langs软件包。
Double click the tesseract-core package and extract it to a directory where you want it to be (a temporary new folder called "Tess_temp"). 双击tesseract-core包并将其解压缩到您想要的目录（一个名为“Tess_temp”的临时新文件夹）。
Double click the tesseract-langs package and extract it to the same directory but add \\tessdata to it in the above "Tess_temp" folder. 双击tesseract-langs包并将其解压缩到同一目录，但在上面的“Tess_temp”文件夹中添加\\ tessdata。 For example if i would have extracted tesseract-core to c:\\Tess_temp, tesseract-langs needs to go to c:\\Tess_temp\\tessdata. 例如，如果我将tesseract-core提取到c：\\ Tess_temp，则tesseract-langs需要转到c：\\ Tess_temp \\ tessdata。
Now copy what ever you have in "Tess_temp" to where tesseract 3.02 was installed in step 1 above (its usially in C:\\Program Files (x86)\\Tesseract-OCR) (replace 3.02 materials with 3.05 ) 现在将“Tess_temp”中的内容复制到上面步骤1中安装tesseract 3.02的地方（它通常位于C：\\ Program Files（x86）\\ Tesseract-OCR）（用3.05替换3.02材料）
It should work now with the 3.05 version on windows. 它现在应该在Windows上使用3.05版本。 copy a sample image test.png (with text) to this tesseract-ocr folder and open a cmd and type in the following commands: 将示例图像test.png（带文本）复制到此tesseract-ocr文件夹并打开cmd并键入以下命令：
go to tesseract folder: cd C:\\Program Files <x86>\\Tesseract-OCR 转到tesseract文件夹： cd C:\\Program Files <x86>\\Tesseract-OCR
run tesseract on test.png: tesseract -l eng test.png test_text -psm 6 在test.png上运行tesseract： tesseract -l eng test.png test_text -psm 6

it will show you 它会告诉你

Tesseract Open Source OCR Engine v3.05.00dev with Leptonica

congratulations ! 恭喜！ (check test_txt.txt for the extracted text) （检查提取文本的test_txt.txt）

如何在没有Visual Studio的Windows上安装leptonica + tesseract在Anaconda中使用？

问题描述

1 个解决方案

解决方案1
5 已采纳 2016-04-22 16:35:00

如何在没有Visual Studio的Windows上安装leptonica + tesseract在Anaconda中使用？

问题描述

1 个解决方案

解决方案1 5 已采纳 2016-04-22 16:35:00

解决方案1
5 已采纳 2016-04-22 16:35:00