从 s3 读取.pptx 文件

Question

I try to open a.pptx from Amazon S3 and read it using the python-pptx library.我尝试从 Amazon S3 打开 a.pptx 并使用 python-pptx 库读取它。 This is the code:这是代码：

from pptx import Presentation
import boto3
s3 = boto3.resource('s3')

obj=s3.Object('bucket','key')
body = obj.get()['Body']
prs=Presentation((body))

It gives "AttributeError: 'StreamingBody' object has no attribute 'seek'".它给出“AttributeError：'StreamingBody' object 没有属性'seek'”。 Shouldn't this work?这不应该工作吗？ How can I fix this?我怎样才能解决这个问题？ I also tried using read() on body first.我也尝试先在 body 上使用 read() 。 Is there a solution without actually downloading the file?有没有实际下载文件的解决方案？

Answer 1

To load files from S3 you should download (or use stream strategy) and use io.BytesIO to transform your data as pptx.Presentation can handle.要从 S3 加载文件，您应该下载（或使用 stream 策略）并使用io.BytesIO将您的数据转换为pptx.Presentation可以处理的。

import io
import boto3

from pptx import Presentation

s3 = boto3.client('s3')
s3_response_object = s3.get_object(Bucket='bucket', Key='file.pptx')
object_content = s3_response_object['Body'].read()

prs = Presentation(io.BytesIO(object_content))

ref:参考：

Just like what we do with variables, data can be kept as bytes in an in-memory buffer when we use the io module's Byte IO operations. journaldev 日志开发

从 s3 读取.pptx 文件

问题描述

1 个解决方案

解决方案1
8 已采纳 2021-01-25 18:50:35

从 s3 读取.pptx 文件

问题描述

1 个解决方案

解决方案1 8 已采纳 2021-01-25 18:50:35

解决方案1
8 已采纳 2021-01-25 18:50:35