如何在tess4j版本4.1。*中将字符列入白名单

Question

The objective is to read number specific data (1,2,...,9,0) from an image. 目的是从图像中读取数字特定的数据（1,2，...，9,0）。 For this, I'm using Tess4j version 4.1.1. 为此，我使用的是Tess4j 4.1.1版。

<!-- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j -->
<dependency>
    <groupId>net.sourceforge.tess4j</groupId>
    <artifactId>tess4j</artifactId>
    <version>4.1.1</version>
</dependency>

My sample code looks like : 我的示例代码如下：

ImageIO.read(new File("c:\\temp\\number1.jpg"));
ITesseract instance = new Tesseract();
instance.doOCR(img);

But for some reason, it is misrecognizing some numbers as alphabetic letters. 但是由于某种原因，它把一些数字误认为是字母。 So to minimize the error I need to whitelist only numbers. 因此，为了最大程度地减少错误，我只需要将数字列入白名单。

As this was possible in earlier Tess4j releases(3.0.**) with TessBaseAPI , but in current 4.1.* version it's not available. 由于这在早期版本Tess4j（3.0。**）用有可能TessBaseAPI ，但在目前的4.1。*版本是不可用的。 Can someone help me out here, how to set whitelist characters in TessAPI 4.1.* and later? 有人可以在这里帮助我，如何在TessAPI 4.1。*及更高版本中设置白名单字符？

Answer 1

The feature is broken since Tesseract 4.00-alpha. 自Tesseract 4.00-alpha起，该功能已损坏。 It has not been fixed yet. 尚未修复。

https://github.com/tesseract-ocr/tesseract/issues/751 https://github.com/tesseract-ocr/tesseract/issues/751

如何在tess4j版本4.1。*中将字符列入白名单

问题描述

1 个解决方案

解决方案1
0 2018-08-06 14:16:55

如何在tess4j版本4.1。*中将字符列入白名单

问题描述

1 个解决方案

解决方案1 0 2018-08-06 14:16:55

解决方案1
0 2018-08-06 14:16:55