简体   繁体   English

使用pytesseract从类型为'PIL.PpmImagePlugin.PpmImageFile'的图像中提取文本时出现错误

[英]Getting error while extracting text from Image with type 'PIL.PpmImagePlugin.PpmImageFile' using pytesseract

trying to extract text from image whose type is 'PIL.PpmImagePlugin.PpmImageFile' using pytesseract . 尝试使用pytesseract从类型为'PIL.PpmImagePlugin.PpmImageFile'图像中提取文本。 The code and the error is as below 代码和错误如下

from pdf2image import convert_from_path
pages = convert_from_path('D:/pdf_csv/HealthCare/eRDS - ML/eRDS - ML/2001468/2001468,69,70.pdf',poppler_path='C:/Users/Hp/poppler-0.68.0/bin')
text = pyt.image_to_string(Image.open(pages[0]), lang='eng')

Error I am getting: 我得到的错误:

AttributeError: 'PpmImageFile' object has no attribute 'read'

Or Is there any method to convert the PpmImageFile to 'jpg' or 'png' format 或是否有任何方法可以将PpmImageFile转换为'jpg'或'png'格式

Add fmt='jpeg' or fmt='png' to your function call to get non-PPM images from pdf2image. fmt='jpeg'fmt='png'到函数调用中,以从pdf2image获取非PPM图像。

In you example, change 在您的示例中,更改

pages = convert_from_path('D:/pdf_csv/Health....001468,69,70.pdf',poppler_path='C:/Users/Hp/poppler-0.68.0/bin')

to

pages = convert_from_path('D:/pdf_csv/Health...001468,69,70.pdf', fmt='jpeg', poppler_path='C:/Users/Hp/poppler-0.68.0/bin')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM