How to whitelist characters in tess4j version 4.1.*

Question

The objective is to read number specific data (1,2,...,9,0) from an image. For this, I'm using Tess4j version 4.1.1.

<!-- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j -->
<dependency>
    <groupId>net.sourceforge.tess4j</groupId>
    <artifactId>tess4j</artifactId>
    <version>4.1.1</version>
</dependency>

My sample code looks like :

ImageIO.read(new File("c:\\temp\\number1.jpg"));
ITesseract instance = new Tesseract();
instance.doOCR(img);

But for some reason, it is misrecognizing some numbers as alphabetic letters. So to minimize the error I need to whitelist only numbers.

As this was possible in earlier Tess4j releases(3.0.**) with TessBaseAPI , but in current 4.1.* version it's not available. Can someone help me out here, how to set whitelist characters in TessAPI 4.1.* and later?

Answer 1

The feature is broken since Tesseract 4.00-alpha. It has not been fixed yet.

https://github.com/tesseract-ocr/tesseract/issues/751

How to whitelist characters in tess4j version 4.1.*

Question

1 answers

solution1
0 2018-08-06 14:16:55

How to whitelist characters in tess4j version 4.1.*

Question

1 answers

solution1 0 2018-08-06 14:16:55

solution1
0 2018-08-06 14:16:55