简体   繁体   English

Java8,Tess4j:使用tesseract优化OCR的图像

[英]Java8, Tess4j : Optimize image for OCR with tesseract

I am working on Tesseract and I have OCR functionality working already. 我正在Tesseract上工作,并且OCR功能已经在工作。 I wanted to optimize the image so that OCR results will be better. 我想优化图像,以使OCR结果更好。 Currently I am only making the image monochrome and scaling it to double its size. 目前,我只将图像制成单色并将其缩放为两倍大小。 Even after that I am having issues with smaller fonts. 即使在那之后,我仍然遇到较小字体的问题。

I tried looking up, and here is one of the top answers I can find. 我尝试查找, 是我可以找到的最佳答案之一。 Unfortunately, it works with Bitmap and I cannot find any native class in Java which works with Bitmap. 不幸的是,它可与Bitmap一起使用,而我找不到Java中与Bitmap一起使用的任何本机类。 There is also an answer with Java code, but it again uses Bitmap and doesn't specify from which package they get it. Java代码也有一个答案,但是它再次使用Bitmap,并且没有指定从哪个包中获取它。

Where does BitmapImageUtil.convertToGrayscale() come from? BitmapImageUtil.convertToGrayscale()来自哪里?

Code : 代码:

private String testOcr(String fileLocation, int attachId) {
        try {
            File imageFile = new File(fileLocation);
            BufferedImage img = ImageIO.read(imageFile);
            String identifier = String.valueOf(new BigInteger(130, random).toString(32));
            String blackAndWhiteImage = previewPath + identifier + ".png";
            File outputfile = new File(blackAndWhiteImage);
            BufferedImage bufferedImage = BitmapImageUtil.convertToGrayscale(img,new Dimension(img.getWidth(),img.getHeight()));
            bufferedImage = Scalr.resize(bufferedImage,img.getWidth()*2,img.getHeight()*2);
            ImageIO.write(bufferedImage,"png",outputfile);

            ITesseract instance = Tesseract.getInstance();
            // Point to one folder above tessdata directory, must contain training data
            instance.setDatapath("/usr/share/tesseract-ocr/");
            // ISO 693-3 standard
            instance.setLanguage("deu");
            String result = instance.doOCR(outputfile);
// result processing with regex. 
}

BitmapImageUtil is from Apache FOP project . BitmapImageUtil来自Apache FOP项目 ("FOP" = "Formatting Objects Processor") (“ FOP” =“格式化对象处理器”)

The package is org.apache.fop.util.bitmap . 包是org.apache.fop.util.bitmap

Source code for release 2.2 is available here 此处提供了版本2.2的源代码

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM