'list' object 没有属性 'read' 在 pdf2image 中面临此错误

Question

I Have this code我有这个代码

tex=pytesseract.image_to_string(Image.open(pdf2image.convert_from_path(PDF_PATH)),lang='mar')

I want to do something like this我想做这样的事情

tex=pytesseract.image_to_string(Image.open(image_path),lang='mar')

Code代码

from PIL import Image
import pytesseract
import cv2
#import cv
import os
import pdf2image
import time
#from pikepdf import Pdf,PdfImage,Name
#defpdftopil()
PDF_PATH=r'C:\Users\Downloads\ViewPDF (1)_one_page.pdf'
img=pdf2image.convert_from_path(PDF_PATH)
tex=pytesseract.image_to_string(Image.open(pdf2image.convert_from_path(PDF_PATH)),lang='mar')
print(tex)
cv2.nameWindow("Input image")
cv2.imshow("input Image",img)
cv2.waitKey(0)
cv2.destroyWindow("Test")
cv2.destroyWindow("Main")

Error错误

Traceback (most recent call last):
  File "D:\System\p\Python\lib\site-packages\PIL\Image.py", line 2882, in open
    fp.seek(0)
AttributeError: 'list' object has no attribute 'seek'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\ocr.py", line 12, in <module>
    tex=pytesseract.image_to_string(Image.open(pdf2image.convert_from_path(PDF_PATH)),lang='mar')
  File "D:\System\p\Python\lib\site-packages\PIL\Image.py", line 2884, in open
    fp = io.BytesIO(fp.read())
AttributeError: 'list' object has no attribute 'read'

Answer 1

The line,线，

    pdf2image.convert_from_path(PDF_PATH)

returns a list of images, one for each page.返回图像列表，每页一张。 The pdf2image project description ( https://pypi.org/project/pdf2image/ ) states: pdf2image 项目描述（ https://pypi.org/project/pdf2image/ ）指出：

    images = convert_from_path('/home/belval/example.pdf')

where images will be a list of PIL Image representing each page of the PDF document.其中images 将是 PIL Image 列表，表示 PDF 文档的每一页。

Solution解决方案

The PIL function, Image.open(), expects an image, not a list. PIL function Image.open() 需要图像，而不是列表。 Therefore, you could do one of two things:因此，您可以做以下两件事之一：

Loop over the list returned by the convert_from_path() method and pass each list item (read: each image), to pytesseract.image_to_string()遍历 convert_from_path() 方法返回的列表，并将每个列表项（读取：每个图像）传递给 pytesseract.image_to_string()
If you are certain that your pdf contains only one page, just access only the first index of the list returned by the convert_from_path() method如果您确定您的 pdf 仅包含一个页面，只需访问 convert_from_path() 方法返回的列表的第一个索引

'list' object 没有属性 'read' 在 pdf2image 中面临此错误

问题描述

1 个解决方案

解决方案1
2 2020-08-06 07:18:05

'list' object 没有属性 'read' 在 pdf2image 中面临此错误

问题描述

1 个解决方案

解决方案1 2 2020-08-06 07:18:05

解决方案1
2 2020-08-06 07:18:05