简体   繁体   English

Stream中的CSV导出(来自Heroku上的Django管理员)

[英]CSV export in Stream (from Django admin on Heroku)

we have the need to export a csv-file comprising data from the model from Django admin which runs on Heroku. 我们需要从运行在Heroku上的Django管理员导出包含模型数据的csv文件。 Therefore we created an action where we created the csv and returned it in the response. 因此,我们创建了一个操作,我们在其中创建了csv并将其返回给响应。 This worked fine until our client started exporting huge sets of data and we run into the 30 second timeout of the Web worker. 这工作正常,直到我们的客户端开始导出大量数据,我们遇到了Web worker的30秒超时。

To circumvent this problem we thought about streaming the csv to the client instead of building it first in memory and sending it in one piece. 为了避免这个问题,我们考虑将csv流式传输到客户端,而不是先在内存中构建它并将其一起发送。 Trigger was this piece of information: Trigger是这条信息:

Cedar supports long-polling and streaming responses. Cedar支持长轮询和流式响应。 Your app has an initial 30 second window to respond with a single byte back to the client. 您的应用程序有一个最初的30秒窗口,以单个字节响应客户端。 After each byte sent (either recieved from > the client or sent by your application) you reset a rolling 55 second window. 发送每个字节后(从客户端收到或由您的应用程序发送),您重置一个滚动的55秒窗口。 If no data is > sent during the 55 second window your connection will be terminated. 如果在55秒窗口期间没有发送数据,则终止连接。

We therefore implemented something that looks like this to test it: 因此,我们实现了一些看起来像这样的东西来测试它:

import cStringIO as StringIO
import csv, time

def csv(request):
    csvfile = StringIO.StringIO()
    csvwriter = csv.writer(csvfile)

def read_and_flush():
    csvfile.seek(0)
    data = csvfile.read()
    csvfile.seek(0)
    csvfile.truncate()
    return data

def data():
    for i in xrange(100000):
        csvwriter.writerow([i,"a","b","c"])
        time.sleep(1)
        data = read_and_flush()
        yield data

response = HttpResponse(data(), mimetype="text/csv")
response["Content-Disposition"] = "attachment; filename=test.csv"
return response

The HTTP header of the download looks like this (from FireBug): 下载的HTTP标头如下所示(来自FireBug):

HTTP/1.1 200 OK
Cache-Control: max-age=0
Content-Disposition: attachment; filename=jobentity-job2.csv
Content-Type: text/csv
Date: Tue, 27 Nov 2012 13:56:42 GMT
Expires: Tue, 27 Nov 2012 13:56:41 GMT
Last-Modified: Tue, 27 Nov 2012 13:56:41 GMT
Server: gunicorn/0.14.6
Vary: Cookie
Transfer-Encoding: chunked
Connection: keep-alive

"Transfer-encoding: chunked" would indicate that Cedar is actually streaming the content chunkwise we guess. “转移编码:chunked”表示Cedar实际上是按照我们猜测的方式流式传输内容。

Problem is that the download of the csv is still interrupted after 30 seconds with these lines in the Heroku log: 问题是在Heroku日志中使用这些行30秒后csv的下载仍然中断:

2012-11-27T13:00:24+00:00 app[web.1]: DEBUG: exporting tasks in csv-stream for job id: 56, 
2012-11-27T13:00:54+00:00 app[web.1]: 2012-11-27 13:00:54 [2] [CRITICAL] WORKER TIMEOUT (pid:5)
2012-11-27T13:00:54+00:00 heroku[router]: at=info method=POST path=/admin/jobentity/ host=myapp.herokuapp.com fwd= dyno=web.1 queue=0 wait=0ms connect=2ms service=29480ms status=200 bytes=51092
2012-11-27T13:00:54+00:00 app[web.1]: 2012-11-27 13:00:54 [2] [CRITICAL] WORKER TIMEOUT (pid:5)
2012-11-27T13:00:54+00:00 app[web.1]: 2012-11-27 13:00:54 [12] [INFO] Booting worker with pid: 12

This should work conceptually, right? 这应该在概念上有效,对吧? Is there anything we missed? 我们错过了什么吗?

We really appreciate your help. 我们非常感谢您的帮助。 Tom 汤姆

I found the solution to the problem. 我找到了问题的解决方案。 It's not a Heroku timeout because otherwise there would be a H12 timeout in the Heroku log (thanks to Caio of Heroku to point that out). 这不是Heroku超时,否则Heroku日志中会出现H12超时(感谢Heroku的Caio指出这一点)。

The problem was the default timeout of Gunicorn wich is 30 seconds. 问题是Gunicorn的默认超时是30秒。 After adding --timeout 600 to the Procfile (at the line of Gunicorn) the problem was gone. 在向Procfile(在Gunicorn线)添加--outoutout 600之后,问题就消失了。

The Procfile now looks like this: Procfile现在看起来像这样:

web: gunicorn myapp.wsgi -b 0.0.0.0:$PORT --timeout 600
celeryd: python manage.py celeryd -E -B --loglevel=INFO

That's rather not the problem of your script, but the problem of 30 seconds web request default Heroku timeout. 这不是你的脚本问题,而是30秒web请求默认Heroku超时的问题。 You could read this: https://devcenter.heroku.com/articles/request-timeout and according to this doc - move your CSV export to background process. 您可以阅读: https//devcenter.heroku.com/articles/request-timeout并根据此文档 - 将您的CSV导出移至后台进程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM