简体   繁体   中英

Convert scanned PDF to Plain Text using Ghostscript

I know there is a way to convert scannedPDF document to OCR'd PDF document using Ghostscript , using the below command:

> bin\gswin64c.exe -sDEVICE=pdfocr32 -o D:\OCR\outputOCRdPDF.pdf -r600 -dDownScaleFactor=3 InputScannedPDF.pdf

and it uses the Tesseract open source to do that. As per the available devices of GS for OCR , It converts to another PDF document, and not just simple text (whereas tesseract does it to plain text as well).

Looks like I am missing something related to usage. Please correct me here or provide your valuable suggestions to convert to simple text instead of PDF

  1. Convert pdf to tif (in cased o multipage document) or other image from (with ghostcript, mupdf , or extract images from pdf (if pdf was created with scanner)
  2. OCR images with tesseract.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM