Python转换PDF

Question

我有以下代码，可以从一个多页PDF中创建多个jpg。 但是，我收到以下错误： wand.exceptions.BlobError: unable to open image '{uuid}.jpg': No such file or directory @ error/blob.c/OpenBlob/2841但是图像已创建。 最初我以为可能是比赛条件，所以我输入了time.sleep()但是那也不起作用，所以我不相信就是这样。 谁看过这个吗？

def split_pdf(pdf_obj, step_functions_client, task_token):
    print(time.time())

    read_pdf = PyPDF2.PdfFileReader(pdf_obj)
    images = []

    for page_num in range(read_pdf.numPages):
        output = PyPDF2.PdfFileWriter()
        output.addPage(read_pdf.getPage(page_num))

        generateduuid = str(uuid.uuid4())
        filename = generateduuid + ".pdf"
        outputfilename = generateduuid + ".jpg"
        with open(filename, "wb") as out_pdf:
            output.write(out_pdf) # write to local instead

        image = {"page": str(page_num + 1)}  # Start at 1 rather than 0

        create_image_process = subprocess.Popen(["gs", "-o " + outputfilename, "-sDEVICE=jpeg", "-r300", "-dJPEGQ=100", filename], stdout=subprocess.PIPE)
        create_image_process.wait()

        time.sleep(10)
        with(Image(filename=outputfilename)) as img:
            image["image_data"] = img.make_blob('jpeg')
            image["height"] = img.height
            image["width"] = img.width
            images.append(image)

            if hasattr(step_functions_client, 'send_task_heartbeat'):
                step_functions_client.send_task_heartbeat(taskToken=task_token)

    return images

Answer 1

尝试首先打开PDF时，似乎没有传递值-因此，您收到的错误。

确保使用完整的文件路径格式化字符串，例如f'/path/to/file/{uuid}.jpg'或'/path/to/file/{}.jpg'.format(uuid)

Answer 2

我真的不明白为什么要使用PyPDF2，GhostScript和魔杖。 您无需解析/操作任何PostScript，并且Wand位于ImageMagick的顶部，而ImageMagick则位于ghostscript的顶部。 您也许可以将功能缩减为一个PDF实用程序。

def split_pdf(pdf_obj, step_functions_client, task_token):
    images = []
    with Image(file=pdf_obj, resolution=300) as document:
        for index, page in enumerate(document.sequence):
            image = {
                "page": index + 1,
                "height": page.height,
                "width": page.width,
            }
            with Image(page) as frame:
                image["image_data"] = frame.make_blob("JPEG")
            images.append(image)
            if hasattr(step_functions_client, 'send_task_heartbeat'):
                step_functions_client.send_task_heartbeat(taskToken=task_token)
    return images

最初我以为可能是比赛条件，所以我输入了time.sleep（），但是那也不起作用，所以我不相信就是这样。 谁看过这个吗？

该示例代码没有任何错误处理。 PDF可以由许多软件供应商生成，并且它们中的许多工作都很草率。 PyPDF或Ghostscript失败的可能性很大，而您再也没有机会解决这个问题。

例如，当我将Ghostscript用于随机网站生成的PDF时，我经常在stderr上看到以下消息...

ignoring zlib error: incorrect data check

...导致文档不完整或空白页。

另一个常见的示例是系统资源已用尽，无法分配额外的内存。 Web服务器一直在发生这种情况，解决方案通常是将任务迁移到队列工作器，该工作器可以在每次任务完成时彻底关闭。

Python转换PDF

问题描述

2 个解决方案

解决方案1
0 2018-07-31 00:23:10

解决方案2
0 2018-07-31 13:04:05

Python转换PDF

问题描述

2 个解决方案

解决方案1 0 2018-07-31 00:23:10

解决方案2 0 2018-07-31 13:04:05

解决方案1
0 2018-07-31 00:23:10

解决方案2
0 2018-07-31 13:04:05