I am writing a simple android app for detecting amounts of money (digits, commas, dots and symbols). I am using Tesseract, more specifically tess-two.
Code Snippet:
this.tessBaseAPI = new TessBaseAPI();
this.tessBaseAPI.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO_ONLY);
//EXTRA SETTINGS
this.tessBaseAPI.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, "0123456789 $€+=-,.");
//this.tessBaseAPI.setVariable(TessBaseAPI.VAR_CHAR_BLACKLIST, "!@#$%^&*()_+=-[]}{;:'\"\\|~`,./<>?");
try {
this.tessBaseAPI.setDebug(true);
this.tessBaseAPI.init(path, "eng+snum"); //eng+osd+snum
this.tessBaseAPI.setImage(bitmap);
this.text = tessBaseAPI.getUTF8Text();
//this.text = tessBaseAPI.getHOCRText(0);
this.tessBaseAPI.end();
} catch (Exception e) {
e.printStackTrace();
System.err.println(e.getMessage());
}
Sadly I am not satisfied at all with the accuracy of Tesseract. I have tried to preprocess the image with a few binarization algorithms and as a matter of fact that improved the accuracy but I would like to try to avoid to preprocess the image since the API I am using is heavy and time-consuming.
So, how can I adjust tesseract to improve accuracy? So far, I have only tried with the white list. Anything else I can do?
Image:
I am not sure what you are doing but tesseract screen.png -
shows all amount correctly:
Estimating resolution as 269
10:06 > = @
€— Movimenti e richieste
AGOSTO 2022
Amazon.it -5,73€
LUGLIO 2022
Paypal -2,19€
GIUGNO 2022
Amazon.it* -16,69€
B | Ricarica con carta +15,00€
B | Ricarica con carta +20,00€
Amazon.it* -20,83€
B | Ricarica con carta +10,00€
Amazon.it* -5,95€
-0,05€
Amazon.it*
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.