I Have this code
tex=pytesseract.image_to_string(Image.open(pdf2image.convert_from_path(PDF_PATH)),lang='mar')
I want to do something like this
tex=pytesseract.image_to_string(Image.open(image_path),lang='mar')
Code
from PIL import Image
import pytesseract
import cv2
#import cv
import os
import pdf2image
import time
#from pikepdf import Pdf,PdfImage,Name
#defpdftopil()
PDF_PATH=r'C:\Users\Downloads\ViewPDF (1)_one_page.pdf'
img=pdf2image.convert_from_path(PDF_PATH)
tex=pytesseract.image_to_string(Image.open(pdf2image.convert_from_path(PDF_PATH)),lang='mar')
print(tex)
cv2.nameWindow("Input image")
cv2.imshow("input Image",img)
cv2.waitKey(0)
cv2.destroyWindow("Test")
cv2.destroyWindow("Main")
Error
Traceback (most recent call last):
File "D:\System\p\Python\lib\site-packages\PIL\Image.py", line 2882, in open
fp.seek(0)
AttributeError: 'list' object has no attribute 'seek'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\ocr.py", line 12, in <module>
tex=pytesseract.image_to_string(Image.open(pdf2image.convert_from_path(PDF_PATH)),lang='mar')
File "D:\System\p\Python\lib\site-packages\PIL\Image.py", line 2884, in open
fp = io.BytesIO(fp.read())
AttributeError: 'list' object has no attribute 'read'
The line,
pdf2image.convert_from_path(PDF_PATH)
returns a list of images, one for each page. The pdf2image project description ( https://pypi.org/project/pdf2image/ ) states:
images = convert_from_path('/home/belval/example.pdf')
where images will be a list of PIL Image representing each page of the PDF document.
Solution
The PIL function, Image.open(), expects an image, not a list. Therefore, you could do one of two things:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.