简体   繁体   English

如何在没有Visual Studio的Windows上安装leptonica + tesseract在Anaconda中使用?

[英]How to install leptonica+tesseract on Windows without Visual Studio to use in Anaconda?

I wanted to perform text recognition from images and I want to use Python. 我想从图像中执行文本识别,我想使用Python。 I installed Anaconda. 我安装了Anaconda。 Now I want to install Tesseract but I also need to install Leptonica. 现在我想安装Tesseract但我还需要安装Leptonica。 I did not find any clear instruction how to do it in windows. 我没有找到任何明确的说明如何在Windows中执行此操作。 For Leptonica I do not want to install Visual Studio. 对于Leptonica我不想安装Visual Studio。 So could anybody provide clear instructions how to install leptonica and tesseract on Windows without Visual Studio to use in anaconda ? 那么有人可以提供明确的说明如何在Windows上安装leptonica和tesseract而不使用Visual Studio在anaconda中使用吗? Thanks. 谢谢。

Here is simple set of steps to have tesseract 3.05 dev version as of 04/22/2016 working both on windows 7 and windows 8 machines: 以下是在Windows 7和Windows 8机器上使用tesseract 3.05 dev版本的简单步骤:

1- install tesseract from its executable from official tesseract-ocr page (version 3.02 for windoes will suffice) 1-从官方tesseract-ocr页面安装tesseract来自其可执行文件(版本3.02 for windoes就足够了)

2- download the following two files for tesseract 3.05 dev version from http://domasofan.spdns.eu/tesseract/ 2-从http://domasofan.spdns.eu/tesseract/下载tesseract 3.05开发版的以下两个文件

There are 2 exe files: 有2个exe文件:

  • tesseract-core-yyyymmdd.exe Tesseract core application without language data tesseract-core-yyyymmdd.exe没有语言数据的Tesseract核心应用程序
  • tesseract-langs-yyyymmdd.exe All the language data available for Tesseract. tesseract-langs-yyyymmdd.exe Tesseract可以使用的所有语言数据。

(yyyymmdd means year 4 digits, month 2 digits and day 2 digits.) (yyyymmdd表示年份4位数,月份2位数字和第2位数字。)

The app is portable so you can install it on a USB stick or in another location. 该应用程序是便携式的,因此您可以将其安装在USB记忆棒或其他位置。

sub Steps to install these: sub安装这些的步骤:

  1. Download the tesseract-core and tesseract-langs packages. 下载tesseract-core和tesseract-langs软件包。
  2. Double click the tesseract-core package and extract it to a directory where you want it to be (a temporary new folder called "Tess_temp"). 双击tesseract-core包并将其解压缩到您想要的目录(一个名为“Tess_temp”的临时新文件夹)。
  3. Double click the tesseract-langs package and extract it to the same directory but add \\tessdata to it in the above "Tess_temp" folder. 双击tesseract-langs包并将其解压缩到同一目录,但在上面的“Tess_temp”文件夹中添加\\ tessdata。 For example if i would have extracted tesseract-core to c:\\Tess_temp, tesseract-langs needs to go to c:\\Tess_temp\\tessdata. 例如,如果我将tesseract-core提取到c:\\ Tess_temp,则tesseract-langs需要转到c:\\ Tess_temp \\ tessdata。

  4. Now copy what ever you have in "Tess_temp" to where tesseract 3.02 was installed in step 1 above (its usially in C:\\Program Files (x86)\\Tesseract-OCR) (replace 3.02 materials with 3.05 ) 现在将“Tess_temp”中的内容复制到上面步骤1中安装tesseract 3.02的地方(它通常位于C:\\ Program Files(x86)\\ Tesseract-OCR)(用3.05替换3.02材料)

  5. It should work now with the 3.05 version on windows. 它现在应该在Windows上使用3.05版本。 copy a sample image test.png (with text) to this tesseract-ocr folder and open a cmd and type in the following commands: 将示例图像test.png(带文本)复制到此tesseract-ocr文件夹并打开cmd并键入以下命令:

    go to tesseract folder: cd C:\\Program Files <x86>\\Tesseract-OCR 转到tesseract文件夹: cd C:\\Program Files <x86>\\Tesseract-OCR

    run tesseract on test.png: tesseract -l eng test.png test_text -psm 6 在test.png上运行tesseract: tesseract -l eng test.png test_text -psm 6

it will show you 它会告诉你

Tesseract Open Source OCR Engine v3.05.00dev with Leptonica

congratulations ! 恭喜! (check test_txt.txt for the extracted text) (检查提取文本的test_txt.txt)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM