简体   繁体   English

内存转换器中的Python Wand.image PDF至JPG

[英]Python Wand.image PDF to JPG in memory converter

I am trying to write some code that will convert a PDF that resides on the web into a series of jpgs. 我正在尝试编写一些代码,将驻留在网络上的PDF转换为一系列jpg。

I got working code that: 我得到的工作代码是:

1) takes pdf 1)需要pdf

2) saves it to disk 2)将其保存到磁盘

3) converts it to JPGs, which are saved to disk. 3)将其转换为JPG,并保存到磁盘。

Is there a way to write the same code (attempt at code below, that throws an error), that would take the PDF from internet, but keep it in memory (to keep the program from writing to disk/reading from disk), then convert it to JPGs (which are to be uploaded to AWS s3)? 有没有一种方法可以编写相同的代码(尝试在下面的代码中抛出错误),该方法将从互联网上获取PDF,但将其保留在内存中(以防止程序写入磁盘/从磁盘读取),然后将其转换为JPG(要上传到AWS s3)?

I was thinking this would work: 我以为这会工作:

f = urlopen("https://s3.us-east-2.amazonaws.com/converted1jpgs/example.pdf") #file to process

But i get the following error: 但我得到以下错误:

"Exception TypeError: TypeError("object of type 'NoneType' has no len()",) in > ignored" “>中的Exception TypeError:TypeError(“类型'NoneType'的对象没有len()”,)>已忽略”

Full code, along with proper PDF file that i want converted. 完整代码以及要转换的正确PDF文件。 Note: the code works if i replace f= with the location of a PDF saved on disk: 注意:如果我将f =替换为磁盘上保存的PDF的位置,则该代码有效:

from urllib2 import urlopen
from wand.image import Image

#location on disk
save_location = "/home/bob/Desktop/pdfs to convert/example1"

#file prefix
test_id = 'example'
print 1
f = urlopen("https://s3.us-east-2.amazonaws.com/converted1jpgs/example.pdf")
print 2
print type(f)

with Image(filename=f) as img:
    print('pages = ', len(img.sequence))
    with img.convert('jpg') as converted:
        converted.save(filename=save_location+"/"+test_id+".jpg")

The result of urlopen obviously isn't a filename, so you can't pass in filename=f and expect it to work. urlopen的结果显然不是文件名,因此您不能传递filename=f并期望它能工作。

I don't have Wand installed, but from the docs , there are clearly a bunch of alternative ways to construct it. 我没有安装Wand,但是从docs来看,显然有很多替代方法可以构建它。

First, urlopen is a file-like object. 首先, urlopen是一个类似文件的对象。 Of course "file-like object" is a somewhat vague term, and not all file-like objects work for all APIs that expect file-like objects (eg, the API may expect to be able to call fileno and read from it at the POSIX level…), but this is at least worth trying (note file instead of filename ): 当然,“类文件对象”这个词有些含糊,并非所有类文件对象都适用于所有希望有类文件对象的API(例如,API可能希望能够在fileno中调用fileno并从中读取fileno 。 POSIX级…),但这至少值得一试(注意file而不是filename ):

with Image(file=f) as img:

If that doesn't work, you can always read the data into memory: 如果那不起作用,您可以随时将数据读入内存:

buf = f.read()
with Image(blob=buf) as img:

Not as ideal (if you have giant files), but at least you don't have to store it on disk. 不太理想(如果您有巨大的文件),但是至少不必将其存储在磁盘上。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM