简体   繁体   English

将 PDF 文件转换为多页图像

[英]Convert PDF file to multipage image

I'm trying to convert a multipage PDF file to image with PyMuPDF:我正在尝试使用 PyMuPDF 将多页 PDF 文件转换为图像:

pdffile = "input.pdf"
doc = fitz.open(pdffile)
page = doc.loadPage()  # number of page
pix = page.getPixmap()
output = "output.tif"
pix.writePNG(output)

But I need to convert all the pages of the PDF file to a single image in multi-page tiff, when I give the page argument a page range, it just takes one page, does anyone know how I can do it?但是我需要将 PDF 文件的所有页面转换为多页 tiff 中的单个图像,当我给页面参数一个页面范围时,它只需要一页,有人知道我该怎么做吗?

When you want to convert all pages of the PDFs, you need a for loop.当您想转换 PDF 的所有页面时,您需要一个 for 循环。 Also, when you call .getPixmap() , you need properties like matrix = mat to basically increase your resolution.此外,当您调用.getPixmap() ,您需要像matrix = mat这样的属性来基本上提高您的分辨率。 Here is the code snippet (not sure if this is what you wanted but this will convert all PDFs to images):这是代码片段(不确定这是否是您想要的,但这会将所有 PDF 转换为图像):

doc = fitz.open(pdf_file)
zoom = 2 # to increase the resolution
mat = fitz.Matrix(zoom, zoom)
noOfPages = doc.pageCount
image_folder = '/path/to/where/to/save/your/images'

for pageNo in range(noOfPages):
    page = doc.loadPage(pageNo) #number of page
    pix = page.getPixmap(matrix = mat)
    
    output = image_folder + str(pageNo) + '.jpg' # you could change image format accordingly
    pix.writePNG(output)
    print('Converting PDFs to Image ... ' + output)
    # do your things afterwards

For resolution, here is a good example from Github to demo what it means and how it's used for your case if needed.为了解决问题,这里有一个来自 Github的很好的例子来演示它的含义以及如果需要它如何用于您的案例。

import fitz
from PIL import Image

input_pdf = "input.pdf"
output_name = "output.tif"
compression = 'zip'  # "zip", "lzw", "group4" - need binarized image...

zoom = 2 # to increase the resolution
mat = fitz.Matrix(zoom, zoom)

doc = fitz.open(input_pdf)
image_list = []
for page in doc:
    pix = page.getPixmap(matrix = mat)
    img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
    image_list.append(img)
    
if image_list:
    image_list[0].save(
        output_name,
        save_all=True,
        append_images=image_list[1:],
        compression=compression,
        dpi=(300, 300),
    )
import fitz    
pdffile = "input.pdf"
doc = fitz.open(pdffile)
for page in doc:
    pix = page.getPixmap()
    output = "output.tif"
    pix.save(output)

PyMuPDF supports a limited set of image types for output. PyMuPDF 支持 output 的一组有限的图像类型。 TIFF is not among them. TIFF不在其中。

However, there is an easy way to interface with Pillow, which supports multiframe TIFF output.但是,有一种与 Pillow 接口的简单方法,它支持多帧 TIFF output。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM