How to read AWS S3 stored word document (.doc and .docx) file content using AWS Lambda Python?

Question

My scenario, I am trying to implement read AWS Stored S3 word document (.doc and .docx) file content from Aws Lambda by using python. Below code I am using, My problem is I can able to get the file name but I can't able to read content.

def lambda_handler(event, context):

    file_contents = s3.Object(‘Bucketname’, 'sample.docx').get()['Body'].read().decode("unicode-escape")

    return {
         'File Name' : obj.key,
         ‘Content’ : file_contents
            }

Response: { "errorMessage": "'unicodeescape' codec can't decode bytes in position 25818-25819: truncated \\xXX escape", "errorType": "UnicodeDecodeError", "stackTrace": [ [ "/var/task/lambda_function.py", 76, "lambda_handler", "file_contents = s3.Object('Bucketname', 'sample.docx').get()['Body'].read().decode(\\"unicode-escape\\")" ] ] }

Answer 1

.docx 和 .doc 文件是二进制文件，所以简单的解码是行不通的，也许docx2txt可能会在这里有所帮助。

How to read AWS S3 stored word document (.doc and .docx) file content using AWS Lambda Python?

Question

1 answers

solution1
0 2019-01-30 08:46:26

How to read AWS S3 stored word document (.doc and .docx) file content using AWS Lambda Python?

Question

1 answers

solution1 0 2019-01-30 08:46:26

solution1
0 2019-01-30 08:46:26