CSV 文件从缓冲区上传到 S3

Question

I am trying to upload content taken out of a model in Django as a csv file.我正在尝试将从 Django 模型中取出的内容作为 csv 文件上传。 I don't want to save the file locally, but keep it in the buffer and upload to s3.我不想将文件保存在本地，而是将其保存在缓冲区中并上传到 s3。 Currently, this code does not error as is, and uploads the file properly, however, the file is empty.目前，此代码不会按原样出错，并正确上传文件，但是文件为空。

file_name='some_file.csv'
fields = [list_of_fields]
header = [header_fields]

buff =  io.StringIO()
writer = csv.writer(buff, dialect='excel', delimiter=',')
writer.writerow(header)
for value in some_queryset:
    row = []
    for field in fields:
        # filling in the row
    writer.writerow(row)

# Upload to s3
client = boto3.client('s3')
bucket = 'some_bucket_name'
date_time = datetime.datetime.now()
date = date_time.date()
time = date_time.time()
dt = '{year}_{month}_{day}__{hour}_{minute}_{second}'.format(
    day=date.day,
    hour=time.hour,
    minute=time.minute,
    month=date.month,
    second=time.second,
    year=date.year,
)
key = 'some_name_{0}.csv'.format(dt)

client.upload_fileobj(buff, bucket, key)

If I take the buffer's content, it is definitely writing it:如果我取缓冲区的内容，它肯定是在写：

content = buff.getvalue()
content.encode('utf-8')
print("content: {0}".format(content)) # prints the csv content

EDIT: I am doing a similar thing with a zip file, created in a buffer:编辑：我正在用一个在缓冲区中创建的 zip 文件做类似的事情：

with zipfile.ZipFile(buff, 'w') as archive:

Writing to the archive (adding pdf files that I am generating), and once I am done, I execute this: buff.seek(0) which seems to be necessary.写入存档（添加我正在生成的 pdf 文件），完成后，我执行以下操作： buff.seek(0)这似乎是必要的。 If I do a similar thing above, it will error out: Unicode-objects must be encoded before hashing如果我在上面做类似的事情，它会出错： Unicode-objects must be encoded before hashing

Answer 1

Okay, disregard my earlier answer, I found the actual problem.好吧，不管我之前的回答，我发现了实际问题。

According to the boto3 documentation for the upload_fileobj function, the first parameter ( Fileobj ) needs to implement a read() method that returns bytes:根据upload_fileobj函数的boto3 文档，第一个参数( Fileobj ) 需要实现一个返回字节的read() 方法：

Fileobj (a file-like object) -- A file-like object to upload. Fileobj（类文件对象）——要上传的类文件对象。 At a minimum, it must implement the read method, and must return bytes.至少，它必须实现 read 方法，并且必须返回字节。

The read() function on a _io.StringIO object returns a string, not bytes. _io.StringIO对象上的read()函数返回一个字符串，而不是字节。 I would suggest swapping the StringIO object for a BytesIO object, adding in the necessary encoding and decoding.我建议将StringIO对象交换为BytesIO对象，添加必要的编码和解码。

Here is a minimal working example.这是一个最小的工作示例。 It's not the most efficient solution - the basic idea is to copy the contents over to a second BytesIO object.这不是最有效的解决方案 - 基本思想是将内容复制到第二个BytesIO对象。

import io
import boto3
import csv

buff = io.StringIO()

writer = csv.writer(buff, dialect='excel', delimiter=',')
writer.writerow(["a", "b", "c"])

buff2 = io.BytesIO(buff.getvalue().encode())

bucket = 'changeme'
key = 'blah.csv'

client = boto3.client('s3')
client.upload_fileobj(buff2, bucket, key)

Answer 2

As explained here using the method put_object rather than upload_fileobj would just do the job right with io.STRINGIO object buffer.正如这里所解释的，使用 put_object 而不是 upload_fileobj 方法只会使用 io.STRINGIO 对象缓冲区来完成这项工作。

So here, to match the initial example:所以在这里，为了匹配初始示例：

client = boto3.client('s3')
client.upload_fileobj(buff2, bucket, key)

would become会变成

client = boto3.client('s3')
client.put_object(Body=buff2, Bucket=bucket, Key=key, ContentType='application/vnd.ms-excel')

Answer 3

Have you tried calling buff.flush() first?你有没有试过先调用 buff.flush() ？ It's possible that your entirely-sensible debugging check (calling getvalue()) is creating the illusion that the buff has been written to, but isn't if you don't call it.您完全合理的调试检查（调用 getvalue()）可能会造成 buff 已写入的错觉，但如果您不调用它，则不会。

Answer 4

You can use something like goofys to redirect output to S3.您可以使用goofys 之类的东西将输出重定向到 S3。

CSV 文件从缓冲区上传到 S3

问题描述

4 个解决方案

解决方案1
21 已采纳 2017-08-15 20:06:30

解决方案2
2 2020-02-19 05:42:25

解决方案3
1 2017-08-15 19:46:56

解决方案4
0 2017-08-15 21:28:09

CSV 文件从缓冲区上传到 S3

问题描述

4 个解决方案

解决方案1 21 已采纳 2017-08-15 20:06:30

解决方案2 2 2020-02-19 05:42:25

解决方案3 1 2017-08-15 19:46:56

解决方案4 0 2017-08-15 21:28:09

解决方案1
21 已采纳 2017-08-15 20:06:30

解决方案2
2 2020-02-19 05:42:25

解决方案3
1 2017-08-15 19:46:56

解决方案4
0 2017-08-15 21:28:09