I am working on a script that would Convert a PDF from the internet (without saving it to disk) to a series of jpegs, then save the JPGs to AWS s3.
Unfortunately, the code below only saves the first page of the PDF as JPG to AWS. Any ideas on how I can modify it to save images to AWS with sequential file names?
from urllib2 import urlopen
from wand.image import Image
from io import BytesIO
import boto3
s3 = boto3.client(
's3',
aws_access_key_id='mykey',
aws_secret_access_key='mykey'
)
bucket_name = 'testbucketAWS323'
#location on disk
#file prefix
test_id = 'example'
f = urlopen("https://s3.us-east-2.amazonaws.com/converted1jpgs/example.pdf")
bytes_io_file = BytesIO()
with Image(file=f) as img:
print('pages = ', len(img.sequence))
with img.convert('png') as converted:
bytes_io_file = BytesIO(converted.make_blob('jpeg'))
#code below should take 'converted' object, and save it to AWS as jpg.
s3.upload_fileobj(bytes_io_file, bucket_name, "assssd.jpg")
print 'done'
Just enumerate over the document pages ( wand.image.Image.sequence
) to get the page number & resource. With the page resource copied to a new instance of Image
, export blob directly, and don't worry about intermediate conversions.
from urllib2 import urlopen
from wand.image import Image
from io import BytesIO
import boto3
# ....
url = 'https://s3.us-east-2.amazonaws.com/converted1jpgs/example.pdf'
resource = urlopen(url)
with Image(file=resource) as document:
for page_number, page in enumerate(document.sequence):
with Image(page) as img:
bytes_io_file = BytesIO(img.make_blob('JPEG'))
filename = 'output_{0}.jpg'.format(page_number)
s3.upload_fileobj(bytes_io_file, bucket_name, filename)
在转换时使用upload_fileobj方法怎么样?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.