Linux上的Tesseract使Glassfish崩溃

Question

We are using Tess4J/Tesseract to perform OCR on a webapp. 我们正在使用Tess4J / Tesseract在Webapp上执行OCR。 On Windows everyting works fine but when deployed on a Linux machine the program crashes, kills the glassfish process and outputs a dump file: hs_err_pidXXXXX.log . 在Windows上，everinging可以正常工作，但是在Linux机器上部署时，该程序将崩溃，杀死玻璃鱼进程并输出转储文件： hs_err_pidXXXXX.log 。

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f9fdd5322a0, pid=10412, tid=140324597778176
#
# JRE version: Java(TM) SE Runtime Environment (7.0_75-b13) (build 1.7.0_75-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.75-b04 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libtesseract.so+0x2532a0]  ERRCODE::error(char const*, TessErrorLogCode, char const*, ...) const+0x190
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  T H R E A D  ---------------

Current thread (0x00007fa00c42d800):  JavaThread "pool-26-thread-1" [_thread_in_native, id=10705, stack(0x00007f9fddbdc000,0x00007f9fddcdd000)]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000000000000

The tesseract commands works and correctly converts images to text. tesseract命令可以正常工作并将图像正确转换为文本。 We have tried the LC_NUMERIC solution but still doesn't work. 我们已经尝试了LC_NUMERIC解决方案，但是仍然无法使用。

Our Tesseract java code is something like this 我们的Tesseract Java代码是这样的

File file; // ...
boolean hOcr; // ...
Rectangle rec; // ...
OcrResult result;
//Tesseract instance = Tesseract.getInstance();
Tesseract1 instance = new Tesseract1();
try {
    instance.setHocr(hOcr);            
    ImageIO.scanForPlugins();
    String res;
    if (rec == null) {
        res = instance.doOCR(file);
    } else {
        res = instance.doOCR(file, rec);
    }
    result = new OcrResult(res, 0, true);
} catch (TesseractException e) {
    log.error("error tesseract", e);
    // process error
} catch (Error e) {
    log.error("error tesseract", e);
    // process error
}

Our specs 我们的规格

Tesseract 3.02.02 Tesseract 3.02.02
Tess4J Tess4J
CentoOS 6.4 CentoOS的6.4
Java 1.7 Java 1.7
Glassfish 4.1 玻璃鱼4.1

Does anyone have any suggestions? 有没有人有什么建议？

Answer 1

It turned out to be a combination of factors: 事实证明，这些因素是综合的：

setting datapath to TESSDATA_PREFIX on server JVM settings in Glassfish 在Glassfish中的服务器JVM设置上将数据路径设置为TESSDATA_PREFIX
and most importantly, applying patches on Tesseract ( found here , credits to the author) due to a known issue concerning system locale - somehow the bug fixes were not applied in latest versions 并且最重要的是，由于与系统区域设置有关的已知问题，在Tesseract上应用了补丁（在此处找到，归功于作者）-某种程度上，该错误修复未应用在最新版本中

Linux上的Tesseract使Glassfish崩溃

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-02-23 09:21:50

Linux上的Tesseract使Glassfish崩溃

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-02-23 09:21:50

解决方案1
0 已采纳 2015-02-23 09:21:50