简体   繁体   English

Tesseract OCR 在 Linux 上的 Java 中不起作用

[英]Tesseract OCR not working in Java on Linux

I deployed a war file to my server with Java working in the backend.我在后端使用 Java 将一个战争文件部署到我的服务器。 I'm trying to get Tesseract to work in Java on CentOS, and it simply won't work.我试图让 Tesseract 在 CentOS 上的 Java 中工作,但它根本行不通。 It works perfectly on my Windows localhost, though.完美的作品在我的Windows本地主机,虽然。 The code I have is:我的代码是:

private void doOCR(File file) // The image file
{
    InputStream stream = new FileInputStream(file);

    ContentHandler handler = new BodyContentHandler();
    Metadata metadata = new Metadata();
    ParseContext context = new ParseContext();

    TesseractOCRConfig config = new TesseractOCRConfig();
    config.setTesseractPath(TESSERACT_PATH);
    // Path on Windows is C://Tesseract-ocr and path on Linux is /usr/local/bin
    context.set(TesseractOCRConfig.class, config);

    TesseractOCRParser tessParser = new TesseractOCRParser();       
    tessParser.parse(stream, handler, metadata, context);
    stream.close();
    System.out.println(handler.toString()); // handler.toString() prints extracted text
}

This code works on Windows, but not on Linux.此代码适用于 Windows,但不适用于 Linux。 I can do Tesseract from the command line, however, and the output file contains the correct text.但是,我可以从命令行执行 Tesseract,并且输出文件包含正确的文本。 Tesseract just won't work from Java on Linux. Tesseract 无法在 Linux 上使用 Java。 Is there anything I am missing here?有什么我在这里想念的吗? Thanks!谢谢!

Ok, I figured out my problem.好的,我想出了我的问题。 On Linux, the tesseract files are stored in many different locations (ie some are in etc/tomcat6, some are in var/lib/tomcat6, etc.).在 Linux 上,tesseract 文件存储在许多不同的位置(即有些在 etc/tomcat6 中,有些在 var/lib/tomcat6 中,等等)。 On my Windows machine, all the files are stored in the same folder (Tesseract-ocr).在我的 Windows 机器上,所有文件都存储在同一个文件夹 (Tesseract-ocr) 中。 I had the path set to the tesseract executable on both machines, but I also needed to have all tesseract data files in the same location.我在两台机器上都设置了 tesseract 可执行文件的路径,但我需要将所有 tesseract 数据文件放在同一位置。 Making this change fixed the problem.进行此更改解决了问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM