pytesseract and image.tif file

Question

I need to transcribe an image.tif with several pages to text using pytesseract. I have the next code:

> From PIL import Image
> Import pytesseract
> Pytesseract.pytesseract.tesseract_cmd = 'C: / Program Files (x86) / Tesseract-
> OCR / tesseract '
> Print (pytesseract.image_to_string (Image.open ('CAMARA.tif'), lang = "spa"))

The problem is that only extract the firs page. How can i extract all of them?

Answer 1

I was able to fix the same problem by calling the method convert() as below

image = Image.open(imagePath).convert("RGBA")
text = pytesseract.image_to_string(image)
print(text)

Answer 2

I guess you have mentioned only one image "camara.tif" , First you have to convert all the pdf pages into images you can see this link for doing so.

And next use pytesseract to loop over images one by one to extract text from image.

Answer 3

I just stumbled over the same problem... what you could do is call tesseract directly

# test.py
import subprocess

in_filename = 'file_0.tiff'
out_filename = 'out'
lang = 'spa'
subprocess.call(['tesseract', in_filename, '-l', lang, out_filename ])

would process all pages

$ python test.py 
Tesseract Open Source OCR Engine v4.0.0-beta.1 with Leptonica
Page 1
Page 2
Page 3

pytesseract and image.tif file

Question

3 answers

solution1
5 2018-08-31 12:15:12

solution2
0 2017-09-12 04:38:18

solution3
0 2018-05-23 17:29:33

pytesseract and image.tif file

Question

3 answers

solution1 5 2018-08-31 12:15:12

solution2 0 2017-09-12 04:38:18

solution3 0 2018-05-23 17:29:33

solution1
5 2018-08-31 12:15:12

solution2
0 2017-09-12 04:38:18

solution3
0 2018-05-23 17:29:33