'list' object has no attribute 'read' facing this error in pdf2image

Question

I Have this code

tex=pytesseract.image_to_string(Image.open(pdf2image.convert_from_path(PDF_PATH)),lang='mar')

I want to do something like this

tex=pytesseract.image_to_string(Image.open(image_path),lang='mar')

Code

from PIL import Image
import pytesseract
import cv2
#import cv
import os
import pdf2image
import time
#from pikepdf import Pdf,PdfImage,Name
#defpdftopil()
PDF_PATH=r'C:\Users\Downloads\ViewPDF (1)_one_page.pdf'
img=pdf2image.convert_from_path(PDF_PATH)
tex=pytesseract.image_to_string(Image.open(pdf2image.convert_from_path(PDF_PATH)),lang='mar')
print(tex)
cv2.nameWindow("Input image")
cv2.imshow("input Image",img)
cv2.waitKey(0)
cv2.destroyWindow("Test")
cv2.destroyWindow("Main")

Error

Traceback (most recent call last):
  File "D:\System\p\Python\lib\site-packages\PIL\Image.py", line 2882, in open
    fp.seek(0)
AttributeError: 'list' object has no attribute 'seek'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\ocr.py", line 12, in <module>
    tex=pytesseract.image_to_string(Image.open(pdf2image.convert_from_path(PDF_PATH)),lang='mar')
  File "D:\System\p\Python\lib\site-packages\PIL\Image.py", line 2884, in open
    fp = io.BytesIO(fp.read())
AttributeError: 'list' object has no attribute 'read'

Answer 1

The line,

    pdf2image.convert_from_path(PDF_PATH)

returns a list of images, one for each page. The pdf2image project description ( https://pypi.org/project/pdf2image/ ) states:

    images = convert_from_path('/home/belval/example.pdf')

where images will be a list of PIL Image representing each page of the PDF document.

Solution

The PIL function, Image.open(), expects an image, not a list. Therefore, you could do one of two things:

Loop over the list returned by the convert_from_path() method and pass each list item (read: each image), to pytesseract.image_to_string()
If you are certain that your pdf contains only one page, just access only the first index of the list returned by the convert_from_path() method

'list' object has no attribute 'read' facing this error in pdf2image

Question

1 answers

solution1
2 2020-08-06 07:18:05

'list' object has no attribute 'read' facing this error in pdf2image

Question

1 answers

solution1 2 2020-08-06 07:18:05

solution1
2 2020-08-06 07:18:05