简体   繁体   中英

Convert PDF file to multipage image

I'm trying to convert a multipage PDF file to image with PyMuPDF:

pdffile = "input.pdf"
doc = fitz.open(pdffile)
page = doc.loadPage()  # number of page
pix = page.getPixmap()
output = "output.tif"
pix.writePNG(output)

But I need to convert all the pages of the PDF file to a single image in multi-page tiff, when I give the page argument a page range, it just takes one page, does anyone know how I can do it?

When you want to convert all pages of the PDFs, you need a for loop. Also, when you call .getPixmap() , you need properties like matrix = mat to basically increase your resolution. Here is the code snippet (not sure if this is what you wanted but this will convert all PDFs to images):

doc = fitz.open(pdf_file)
zoom = 2 # to increase the resolution
mat = fitz.Matrix(zoom, zoom)
noOfPages = doc.pageCount
image_folder = '/path/to/where/to/save/your/images'

for pageNo in range(noOfPages):
    page = doc.loadPage(pageNo) #number of page
    pix = page.getPixmap(matrix = mat)
    
    output = image_folder + str(pageNo) + '.jpg' # you could change image format accordingly
    pix.writePNG(output)
    print('Converting PDFs to Image ... ' + output)
    # do your things afterwards

For resolution, here is a good example from Github to demo what it means and how it's used for your case if needed.

import fitz
from PIL import Image

input_pdf = "input.pdf"
output_name = "output.tif"
compression = 'zip'  # "zip", "lzw", "group4" - need binarized image...

zoom = 2 # to increase the resolution
mat = fitz.Matrix(zoom, zoom)

doc = fitz.open(input_pdf)
image_list = []
for page in doc:
    pix = page.getPixmap(matrix = mat)
    img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
    image_list.append(img)
    
if image_list:
    image_list[0].save(
        output_name,
        save_all=True,
        append_images=image_list[1:],
        compression=compression,
        dpi=(300, 300),
    )
import fitz    
pdffile = "input.pdf"
doc = fitz.open(pdffile)
for page in doc:
    pix = page.getPixmap()
    output = "output.tif"
    pix.save(output)

PyMuPDF supports a limited set of image types for output. TIFF is not among them.

However, there is an easy way to interface with Pillow, which supports multiframe TIFF output.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM