I have a function that gets a page from a PDF file via pyPdf2
and should convert the first page to a png (or jpg) with Pillow
(PIL Fork)
from PyPDF2 import PdfFileWriter, PdfFileReader
import os
from PIL import Image
import io
# Open PDF Source #
app_path = os.path.dirname(__file__)
src_pdf= PdfFileReader(open(os.path.join(app_path, "../../../uploads/%s" % filename), "rb"))
# Get the first page of the PDF #
dst_pdf = PdfFileWriter()
dst_pdf.addPage(src_pdf.getPage(0))
# Create BytesIO #
pdf_bytes = io.BytesIO()
dst_pdf.write(pdf_bytes)
pdf_bytes.seek(0)
file_name = "../../../uploads/%s_p%s.png" % (name, pagenum)
img = Image.open(pdf_bytes)
img.save(file_name, 'PNG')
pdf_bytes.flush()
That results in an error:
OSError: cannot identify image file <_io.BytesIO object at 0x0000023440F3A8E0>
I found some threads with a similar issue, ( PIL open() method not working with BytesIO ) but I cannot see where I am wrong here, as I have pdf_bytes.seek(0)
already added.
Any hints appreciated
Per document:
write(stream) Writes the collection of pages added to this object out as a PDF file.
Parameters: stream – An object to write the file to. The object must support the write method and the tell method, similar to a file object.
So the object pdf_bytes contains a PDF file, not an image file.
The reason why there are codes like above work is: sometimes, the pdf file just contains a jpeg file as its content. If your pdf is just a normal pdf file, you can't just read the bytes and parse it as an image.
And refer to as a more robust implementation: https://stackoverflow.com/a/34116472/334999
import glob, sys, fitz
# To get better resolution
zoom_x = 2.0 # horizontal zoom
zoom_y = 2.0 # vertical zoom
mat = fitz.Matrix(zoom_x, zoom_y) # zoom factor 2 in
each dimension
filename = "/xyz/abcd/1234.pdf" # name of pdf file you want to render
doc = fitz.open(filename)
for page in doc:
pix = page.get_pixmap(matrix=mat) # render page to an image
pix.save("/xyz/abcd/1234.png") # store image as a PNG
Credit
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.