Convert scanned PDF to Plain Text using Ghostscript

Question

I know there is a way to convert scannedPDF document to OCR'd PDF document using Ghostscript , using the below command:

> bin\gswin64c.exe -sDEVICE=pdfocr32 -o D:\OCR\outputOCRdPDF.pdf -r600 -dDownScaleFactor=3 InputScannedPDF.pdf

and it uses the Tesseract open source to do that. As per the available devices of GS for OCR , It converts to another PDF document, and not just simple text (whereas tesseract does it to plain text as well).

Looks like I am missing something related to usage. Please correct me here or provide your valuable suggestions to convert to simple text instead of PDF

Answer 1

Convert pdf to tif (in cased o multipage document) or other image from (with ghostcript, mupdf , or extract images from pdf (if pdf was created with scanner)
OCR images with tesseract.

Convert scanned PDF to Plain Text using Ghostscript

Question

1 answers

solution1
0 2021-03-05 19:36:15

Convert scanned PDF to Plain Text using Ghostscript

Question

1 answers

solution1 0 2021-03-05 19:36:15

solution1
0 2021-03-05 19:36:15