[英]Create Searchable pdf with Tesseract in Android
I have been hired by my client to create an android application that would perform Ocr on an image using Tesseract to convert it into a searchable pdf.我的客户已聘请我创建一个 android 应用程序,该应用程序将使用 Tesseract 对图像执行 Ocr 以将其转换为可搜索的 pdf。
Currently am able to extract text from images using this code;目前能够使用此代码从图像中提取文本;
String extractText(String imagePath)
{
dataPath= Environment.getExternalStorageDirectory().toString() + "/Android/data/" + appContext.getPackageName() + "/";
File tessdata = new File(dataPath);
if (!tessdata.exists() || !tessdata.isDirectory())
{
throw new IllegalArgumentException("Data path must contain subfolder tessdata!");
}
Bitmap image= BitmapFactory.decodeFile(imagePath);
TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init(dataPath, "eng");
baseApi.setImage(image);
String recognizedText = baseApi.getUTF8Text();
baseApi.end();
return recognizedText;
}
The above code helps me get the text on the image accurately as a string, but I don't know how to create a searchable pdf with this text.上面的代码帮助我准确地将图像上的文本作为字符串获取,但我不知道如何使用此文本创建可搜索的 pdf。
getUTF8Text
returns only plain text. getUTF8Text
仅返回纯文本。 You'd need to use TessPDFRenderer
API for PDF output.对于 PDF output,您需要使用TessPDFRenderer
API。
https://github.com/tesseract-ocr/tesseract/tree/master/src/api https://github.com/tesseract-ocr/tesseract/tree/master/src/api
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.