简体   繁体   中英

Opening a .docx file in S3 bucket in Python (Boto3)

In one of our S3 buckets, we have a .docx file with Mail Merge fields in it.

What I'm trying to do is directly read it directly from the bucket without first downloading it locally!

Typically, I can open a file and see the mail merge fields within it through the use of this code:

from mailmerge import MailMerge
document = MailMerge(r'C:\Users\User\Desktop\MailMergeFile.docx') # Trying to get a variable to pass in here
print(document.get_merge_fields())

As seen above, what I'm trying to do is to get the object in a way where I can just pass it to the MailMerge method, as though I were passing a path on my local machine.

The ways I've looked up to do this haven't been able to work.

fileobj = s3.get_object(
    Bucket='bucketname',
    Key='folder/mailmergefile.docx'
    ) 

word_file = fileobj['Body'].read()
contents = word_file.decode('ISO-8859-1') # can't use utf-8 as that gives encoding error

contents

But then when I try and pass the contents variable to the Mailmerge function, I get another error:

document = MailMerge(contents)
print(document.get_merge_fields())

The error I get is: ValueError: embedded null character

I presume you are using docx-mailmerge · PyPI .

The documentation is quite sparse but is shows MailMerge('input.docx') , which suggests that it is expecting the name of a file, not the 'contents' of a file.

In looking at the code , it seems to be calling a library to open a zip file.

Bottom line: As written, it wants the name of a file, not the contents of the file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM