将 PDF 文件转换为多页图像

Question

我正在尝试使用 PyMuPDF 将多页 PDF 文件转换为图像：

pdffile = "input.pdf"
doc = fitz.open(pdffile)
page = doc.loadPage()  # number of page
pix = page.getPixmap()
output = "output.tif"
pix.writePNG(output)

但是我需要将 PDF 文件的所有页面转换为多页 tiff 中的单个图像，当我给页面参数一个页面范围时，它只需要一页，有人知道我该怎么做吗？

Answer 1

当您想转换 PDF 的所有页面时，您需要一个 for 循环。 此外，当您调用.getPixmap() ，您需要像matrix = mat这样的属性来基本上提高您的分辨率。 这是代码片段（不确定这是否是您想要的，但这会将所有 PDF 转换为图像）：

doc = fitz.open(pdf_file)
zoom = 2 # to increase the resolution
mat = fitz.Matrix(zoom, zoom)
noOfPages = doc.pageCount
image_folder = '/path/to/where/to/save/your/images'

for pageNo in range(noOfPages):
    page = doc.loadPage(pageNo) #number of page
    pix = page.getPixmap(matrix = mat)
    
    output = image_folder + str(pageNo) + '.jpg' # you could change image format accordingly
    pix.writePNG(output)
    print('Converting PDFs to Image ... ' + output)
    # do your things afterwards

为了解决问题，这里有一个来自 Github的很好的例子来演示它的含义以及如果需要它如何用于您的案例。

Answer 2

import fitz
from PIL import Image

input_pdf = "input.pdf"
output_name = "output.tif"
compression = 'zip'  # "zip", "lzw", "group4" - need binarized image...

zoom = 2 # to increase the resolution
mat = fitz.Matrix(zoom, zoom)

doc = fitz.open(input_pdf)
image_list = []
for page in doc:
    pix = page.getPixmap(matrix = mat)
    img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
    image_list.append(img)
    
if image_list:
    image_list[0].save(
        output_name,
        save_all=True,
        append_images=image_list[1:],
        compression=compression,
        dpi=(300, 300),
    )

Answer 3

import fitz    
pdffile = "input.pdf"
doc = fitz.open(pdffile)
for page in doc:
    pix = page.getPixmap()
    output = "output.tif"
    pix.save(output)

Answer 4

PyMuPDF 支持 output 的一组有限的图像类型。 TIFF不在其中。

但是，有一种与 Pillow 接口的简单方法，它支持多帧 TIFF output。

将 PDF 文件转换为多页图像

问题描述

4 个解决方案

解决方案1
2 2020-10-13 21:27:57

解决方案2
2 2021-02-12 15:25:42

解决方案3
0 2022-08-31 21:12:17

解决方案4
0 2022-09-04 15:26:48

将 PDF 文件转换为多页图像

问题描述

4 个解决方案

解决方案1 2 2020-10-13 21:27:57

解决方案2 2 2021-02-12 15:25:42

解决方案3 0 2022-08-31 21:12:17

解决方案4 0 2022-09-04 15:26:48

解决方案1
2 2020-10-13 21:27:57

解决方案2
2 2021-02-12 15:25:42

解决方案3
0 2022-08-31 21:12:17

解决方案4
0 2022-09-04 15:26:48