[英]CSV file upload from buffer to S3
I am trying to upload content taken out of a model in Django as a csv file.我正在尝试将从 Django 模型中取出的内容作为 csv 文件上传。 I don't want to save the file locally, but keep it in the buffer and upload to s3.
我不想将文件保存在本地,而是将其保存在缓冲区中并上传到 s3。 Currently, this code does not error as is, and uploads the file properly, however, the file is empty.
目前,此代码不会按原样出错,并正确上传文件,但是文件为空。
file_name='some_file.csv'
fields = [list_of_fields]
header = [header_fields]
buff = io.StringIO()
writer = csv.writer(buff, dialect='excel', delimiter=',')
writer.writerow(header)
for value in some_queryset:
row = []
for field in fields:
# filling in the row
writer.writerow(row)
# Upload to s3
client = boto3.client('s3')
bucket = 'some_bucket_name'
date_time = datetime.datetime.now()
date = date_time.date()
time = date_time.time()
dt = '{year}_{month}_{day}__{hour}_{minute}_{second}'.format(
day=date.day,
hour=time.hour,
minute=time.minute,
month=date.month,
second=time.second,
year=date.year,
)
key = 'some_name_{0}.csv'.format(dt)
client.upload_fileobj(buff, bucket, key)
If I take the buffer's content, it is definitely writing it:如果我取缓冲区的内容,它肯定是在写:
content = buff.getvalue()
content.encode('utf-8')
print("content: {0}".format(content)) # prints the csv content
EDIT: I am doing a similar thing with a zip file, created in a buffer:编辑:我正在用一个在缓冲区中创建的 zip 文件做类似的事情:
with zipfile.ZipFile(buff, 'w') as archive:
Writing to the archive (adding pdf files that I am generating), and once I am done, I execute this: buff.seek(0)
which seems to be necessary.写入存档(添加我正在生成的 pdf 文件),完成后,我执行以下操作:
buff.seek(0)
这似乎是必要的。 If I do a similar thing above, it will error out: Unicode-objects must be encoded before hashing
如果我在上面做类似的事情,它会出错:
Unicode-objects must be encoded before hashing
Okay, disregard my earlier answer, I found the actual problem.好吧,不管我之前的回答,我发现了实际问题。
According to the boto3 documentation for the upload_fileobj
function, the first parameter ( Fileobj
) needs to implement a read() method that returns bytes:根据
upload_fileobj
函数的boto3 文档,第一个参数( Fileobj
) 需要实现一个返回字节的read() 方法:
Fileobj (a file-like object) -- A file-like object to upload.
Fileobj(类文件对象)——要上传的类文件对象。 At a minimum, it must implement the read method, and must return bytes.
至少,它必须实现 read 方法,并且必须返回字节。
The read()
function on a _io.StringIO
object returns a string, not bytes. _io.StringIO
对象上的read()
函数返回一个字符串,而不是字节。 I would suggest swapping the StringIO
object for a BytesIO
object, adding in the necessary encoding and decoding.我建议将
StringIO
对象交换为BytesIO
对象,添加必要的编码和解码。
Here is a minimal working example.这是一个最小的工作示例。 It's not the most efficient solution - the basic idea is to copy the contents over to a second
BytesIO
object.这不是最有效的解决方案 - 基本思想是将内容复制到第二个
BytesIO
对象。
import io
import boto3
import csv
buff = io.StringIO()
writer = csv.writer(buff, dialect='excel', delimiter=',')
writer.writerow(["a", "b", "c"])
buff2 = io.BytesIO(buff.getvalue().encode())
bucket = 'changeme'
key = 'blah.csv'
client = boto3.client('s3')
client.upload_fileobj(buff2, bucket, key)
As explained here using the method put_object rather than upload_fileobj would just do the job right with io.STRINGIO object buffer.正如这里所解释的,使用 put_object 而不是 upload_fileobj 方法只会使用 io.STRINGIO 对象缓冲区来完成这项工作。
So here, to match the initial example:所以在这里,为了匹配初始示例:
client = boto3.client('s3')
client.upload_fileobj(buff2, bucket, key)
would become会变成
client = boto3.client('s3')
client.put_object(Body=buff2, Bucket=bucket, Key=key, ContentType='application/vnd.ms-excel')
Have you tried calling buff.flush() first?你有没有试过先调用 buff.flush() ? It's possible that your entirely-sensible debugging check (calling getvalue()) is creating the illusion that the buff has been written to, but isn't if you don't call it.
您完全合理的调试检查(调用 getvalue())可能会造成 buff 已写入的错觉,但如果您不调用它,则不会。
You can use something like goofys to redirect output to S3.您可以使用goofys 之类的东西将输出重定向到 S3。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.