Tesseract 图像可在 java 中搜索到 pdf

Question

I am trying to convert the image to a searchable pdf using tesseract.我正在尝试使用 tesseract 将图像转换为可搜索的 pdf。 The below command line option working fine for me.下面的命令行选项对我来说工作正常。

Exploring a similar option in java. But not sure what to pass in the arguments. Below is my java code在 java 中探索类似的选项。但不确定在 arguments 中传递什么。下面是我的 java 代码

import java.io.File;
import java.util.Arrays;
import java.util.List;

import net.sf.saxon.expr.instruct.ValueOf;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;

public class Mask2 {

    public static void main(String[] args) {

        File image = new File("D:\\ML\\Java\\img3.PNG");
        Tesseract tesseract = new Tesseract();
        tesseract.setDatapath("C://Program Files//Tesseract-OCR//tessdata");
        tesseract.setLanguage("eng");
        tesseract.setPageSegMode(1);
        tesseract.setOcrEngineMode(1);
        try {

       // Not sure what to pass in arguments
        tesseract.createDocumentsWithResults()
            
            
        } catch (TesseractException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

Any Suggestions / Solutions would be much helpful.任何建议/解决方案都会很有帮助。

Answer 1

you can create a list of renderFormats like this ( you can add others)你可以像这样创建一个 renderFormats 列表（你可以添加其他的）

List<RenderedFormat> renderFormats = new ArrayList<RenderedFormat>();
                    renderFormats.add(RenderedFormat.PDF);

and then you can pass the path of the input filename (PDF or IMG), the path of the output filename with no extension, and the render format you want to use.然后您可以传递输入文件名（PDF 或 IMG）的路径、不带扩展名的 output 文件名的路径以及您要使用的渲染格式。

tesseract.createDocuments("a/b/c/inputfile.PNG", "a/b/c/outputfile", renderFormats);

Ciao!再见！

Tesseract 图像可在 java 中搜索到 pdf

问题描述

1 个解决方案

解决方案1
0 2023-01-31 08:15:02

Tesseract 图像可在 java 中搜索到 pdf

问题描述

1 个解决方案

解决方案1 0 2023-01-31 08:15:02

解决方案1
0 2023-01-31 08:15:02