简体   繁体   English

'list' object 没有属性 'read' 在 pdf2image 中面临此错误

[英]'list' object has no attribute 'read' facing this error in pdf2image

I Have this code我有这个代码

tex=pytesseract.image_to_string(Image.open(pdf2image.convert_from_path(PDF_PATH)),lang='mar')

I want to do something like this我想做这样的事情

tex=pytesseract.image_to_string(Image.open(image_path),lang='mar')

Code代码

from PIL import Image
import pytesseract
import cv2
#import cv
import os
import pdf2image
import time
#from pikepdf import Pdf,PdfImage,Name
#defpdftopil()
PDF_PATH=r'C:\Users\Downloads\ViewPDF (1)_one_page.pdf'
img=pdf2image.convert_from_path(PDF_PATH)
tex=pytesseract.image_to_string(Image.open(pdf2image.convert_from_path(PDF_PATH)),lang='mar')
print(tex)
cv2.nameWindow("Input image")
cv2.imshow("input Image",img)
cv2.waitKey(0)
cv2.destroyWindow("Test")
cv2.destroyWindow("Main")

Error错误

Traceback (most recent call last):
  File "D:\System\p\Python\lib\site-packages\PIL\Image.py", line 2882, in open
    fp.seek(0)
AttributeError: 'list' object has no attribute 'seek'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\ocr.py", line 12, in <module>
    tex=pytesseract.image_to_string(Image.open(pdf2image.convert_from_path(PDF_PATH)),lang='mar')
  File "D:\System\p\Python\lib\site-packages\PIL\Image.py", line 2884, in open
    fp = io.BytesIO(fp.read())
AttributeError: 'list' object has no attribute 'read'

The line,线,

    pdf2image.convert_from_path(PDF_PATH)

returns a list of images, one for each page.返回图像列表,每页一张。 The pdf2image project description ( https://pypi.org/project/pdf2image/ ) states: pdf2image 项目描述( https://pypi.org/project/pdf2image/ )指出:

    images = convert_from_path('/home/belval/example.pdf')

where images will be a list of PIL Image representing each page of the PDF document.其中images 将是 PIL Image 列表,表示 PDF 文档的每一页。

Solution解决方案

The PIL function, Image.open(), expects an image, not a list. PIL function Image.open() 需要图像,而不是列表。 Therefore, you could do one of two things:因此,您可以做以下两件事之一:

  1. Loop over the list returned by the convert_from_path() method and pass each list item (read: each image), to pytesseract.image_to_string()遍历 convert_from_path() 方法返回的列表,并将每个列表项(读取:每个图像)传递给 pytesseract.image_to_string()
  2. If you are certain that your pdf contains only one page, just access only the first index of the list returned by the convert_from_path() method如果您确定您的 pdf 仅包含一个页面,只需访问 convert_from_path() 方法返回的列表的第一个索引

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从pdf2image中的文件存储object中读取PDF? - How to read PDF from file storage object in pdf2image? windows 上的 pdf2image 不断产生错误 - pdf2image on windows keeps producing error ModuleNotFoundError:没有名为“pdf2image”的模块错误? - ModuleNotFoundError: No module named 'pdf2image' error? 面临错误“DirectoryIterator”object 没有属性“缓存” - Facing error 'DirectoryIterator' object has no attribute 'cache' python pdf2image“可能不是 PDF 文件”错误 - python pdf2image "May not be a PDF file" error pdf2image如何使用“启用所有功能”阅读pdf-Windows - pdf2image how to read pdfs with “enable all features” - windows 尝试使用 python 从 azure 容器中读取带有前缀的 blob,但遇到错误“'ItemPaged' object 没有属性 'objects'” - Trying to read blobs from azure container with prefix using python, but facing error “'ItemPaged' object has no attribute 'objects'” “图像” object 没有属性“读取” - 'Image' object has no attribute 'read' Python:pdf2image 不写入.jpg - 无错误消息 - Python: pdf2image doesn't write .jpg - no error message 在谷歌colab中面对“属性错误:'dict'object没有属性'dtype'” - Facing "attribute error:'dict' object has no attribute 'dtype' " in google colab
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM