简体   繁体   中英

How to extract text from PDF image

I wanted to extract data from PDF which has image, and the image is form where letter will be inside small boxes for example, name : test , here each and every word will be inside square box.

I have tried tesseract OCR could not get the desired result.

I have tried commercial ABBYY worked but I wanted to use java based free API.

below is the example 在此处输入图片说明

Nicomsoft OCR SDK which is a free SDK has extracted the text from my PDF and results are satisfactory

it supports really large technologies, Now I am trying to integrate it into my application

Link https://www.nicomsoft.com/

As far as free goes in OCR, Tesseract is as good as it gets.

Alternatively you could look at the Windows 10 UWP OCR offering .

I am not sure about the free ones out there, but you can definitely try TotalPDFConverterOCR

It has wide range of things like converting to doc,images etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM