[英]Getting error while extracting text from Image with type 'PIL.PpmImagePlugin.PpmImageFile' using pytesseract
trying to extract text from image whose type is 'PIL.PpmImagePlugin.PpmImageFile'
using pytesseract
. 尝试使用
pytesseract
从类型为'PIL.PpmImagePlugin.PpmImageFile'
图像中提取文本。 The code and the error is as below 代码和错误如下
from pdf2image import convert_from_path
pages = convert_from_path('D:/pdf_csv/HealthCare/eRDS - ML/eRDS - ML/2001468/2001468,69,70.pdf',poppler_path='C:/Users/Hp/poppler-0.68.0/bin')
text = pyt.image_to_string(Image.open(pages[0]), lang='eng')
Error I am getting: 我得到的错误:
AttributeError: 'PpmImageFile' object has no attribute 'read'
Or Is there any method to convert the PpmImageFile to 'jpg' or 'png' format 或是否有任何方法可以将PpmImageFile转换为'jpg'或'png'格式
Add fmt='jpeg'
or fmt='png'
to your function call to get non-PPM images from pdf2image. 将
fmt='jpeg'
或fmt='png'
到函数调用中,以从pdf2image获取非PPM图像。
In you example, change 在您的示例中,更改
pages = convert_from_path('D:/pdf_csv/Health....001468,69,70.pdf',poppler_path='C:/Users/Hp/poppler-0.68.0/bin')
to 至
pages = convert_from_path('D:/pdf_csv/Health...001468,69,70.pdf', fmt='jpeg', poppler_path='C:/Users/Hp/poppler-0.68.0/bin')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.