在 Android 中使用 Tesseract 创建可搜索的 pdf

Question

I have been hired by my client to create an android application that would perform Ocr on an image using Tesseract to convert it into a searchable pdf.我的客户已聘请我创建一个 android 应用程序，该应用程序将使用 Tesseract 对图像执行 Ocr 以将其转换为可搜索的 pdf。

Currently am able to extract text from images using this code;目前能够使用此代码从图像中提取文本；

  String extractText(String imagePath)
  {
  dataPath= Environment.getExternalStorageDirectory().toString() + "/Android/data/" + appContext.getPackageName() +  "/"; 
    File tessdata = new File(dataPath); 
   if (!tessdata.exists() || !tessdata.isDirectory())
   {
       throw new IllegalArgumentException("Data path must contain subfolder tessdata!");   
} 
     Bitmap image= BitmapFactory.decodeFile(imagePath);
  TessBaseAPI baseApi = new TessBaseAPI();
  baseApi.init(dataPath, "eng"); 

baseApi.setImage(image);
      String recognizedText = baseApi.getUTF8Text();
      baseApi.end();


      return recognizedText;
  }

The above code helps me get the text on the image accurately as a string, but I don't know how to create a searchable pdf with this text.上面的代码帮助我准确地将图像上的文本作为字符串获取，但我不知道如何使用此文本创建可搜索的 pdf。

Answer 1

getUTF8Text returns only plain text. getUTF8Text仅返回纯文本。 You'd need to use TessPDFRenderer API for PDF output.对于 PDF output，您需要使用TessPDFRenderer API。

https://github.com/tesseract-ocr/tesseract/tree/master/src/api https://github.com/tesseract-ocr/tesseract/tree/master/src/api

在 Android 中使用 Tesseract 创建可搜索的 pdf

问题描述

1 个解决方案

解决方案1
0 2020-12-20 17:43:28

在 Android 中使用 Tesseract 创建可搜索的 pdf

问题描述

1 个解决方案

解决方案1 0 2020-12-20 17:43:28

解决方案1
0 2020-12-20 17:43:28