I have a collection of pdfs, each containing a scan of an A4 paper, that are different in size. I would like to convert them to an image and fix the resolution of the outgoing image.
My code to convert to jpg (without resizing):
from pdf2image import convert_from_path
filename_in = 'myfile.pdf'
filename_out = 'myfile.jpg'
jpeg = convert_from_path( filename_in )
jpeg[0].save( filename_out , 'JPEG' )
If the pdf I am trying to convert has any colour in it, the above does not work and the outgoing image is completely white (with non-zero dimensions). Is this a known problem and does a solution exist?
I am using Python 3.7.3.
I am unable to share the pdf files as they contain private information.
You can try to extract the images and correct resolutions instead of converting PDFs.
Try pdfreader , here is a sample code extracting all images (the both inline and XObject) from a doc.
from pdfreader import SimplePDFViewer, PageDoesNotExist
fd = open(you_pdf_file_name, "rb")
viewer = SimplePDFViewer(fd)
images = []
try:
while True:
viewer.render()
images.extend(viewer.canvas.inline_images)
images.extend(viewer.canvas.images.values())
viewer.next()
except PageDoesNotExist:
pass
Then you can convert images to PIL/Pillow object and save (or do whatever you need)
for i, img in enumerate(images):
img.to_Pillow().save("{}.png".format(i))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.