简体   繁体   中英

How to read AWS S3 stored word document (.doc and .docx) file content using AWS Lambda Python?

My scenario, I am trying to implement read AWS Stored S3 word document (.doc and .docx) file content from Aws Lambda by using python. Below code I am using, My problem is I can able to get the file name but I can't able to read content.

def lambda_handler(event, context):

    file_contents = s3.Object(‘Bucketname’, 'sample.docx').get()['Body'].read().decode("unicode-escape")

    return {
         'File Name' : obj.key,
         ‘Content’ : file_contents
            }

Response: { "errorMessage": "'unicodeescape' codec can't decode bytes in position 25818-25819: truncated \\xXX escape", "errorType": "UnicodeDecodeError", "stackTrace": [ [ "/var/task/lambda_function.py", 76, "lambda_handler", "file_contents = s3.Object('Bucketname', 'sample.docx').get()['Body'].read().decode(\\"unicode-escape\\")" ] ] }

.docx 和 .doc 文件是二进制文件,所以简单的解码是行不通的,也许docx2txt可能会在这里有所帮助。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM