简体   繁体   English

Flask:通过写入客户端来传输数据?

[英]Flask: Streaming data by writing to client?

I have existing code that serializes data to a file-like object: 我有现有的代码将数据序列化到类文件对象:

def some_serialization_function(file):
    file.write(...)

In Flask, I would like to be able to send the serialized data directly to the client, without buffering it in memory first. 在Flask中,我希望能够将序列化数据直接发送到客户端,而无需先在内存中缓冲它。

I looked at ResponseStreamMixin from werkzeug, but I don't think it can work without buffering: 我查看了来自werkzeug的ResponseStreamMixin,但我不认为它可以在没有缓冲的情况下工作:

class StreamResponse(flask.Response, werkzeug.wrappers.ResponseStreamMixin):
   pass

@app.route("/data")
def get_data():
   r = StreamResponse()
   some_serialization_function(r.stream) # everything is buffered into memory
   return r # buffered data is sent after return

All examples for streaming data that I found are based on generators, which work in the opposite direction (ie data is "pulled" from the generator, not "pushed out" via a write call), so I wonder, is there a way to "write" directly to the client in Flask? 我发现的流数据的所有示例都基于生成器,它们在相反的方向上工作(即数据从发生器“拉出”,而不是通过写入调用“推出”),所以我想, 是否有办法在Flask中直接“写”给客户?

EDIT - to be more clear: I'm looking for a way to serve the data generated by "some_serialization_function(...)" (which I cannot easily change) without the memory/IO overhead of having that function write all the data to a buffer/file first. 编辑 - 更清楚:我正在寻找一种方法来提供由“some_serialization_function(...)”(我不能轻易改变)生成的数据,而没有使用该函数将所有数据写入的内存/ IO开销首先是缓冲区/文件。

(I suspect that a tempfile will be the way to go in the end, since the IO overhead will not be significant in comparison to the overhead of actually sending the data over the network. Also my main concern is memory overhead). (我怀疑临时文件将是最终的方式,因为与通过网络实际发送数据的开销相比,IO开销不会很大。此外,我主要担心的是内存开销)。

You can create a special file-like object that feeds the generator that streams out to the client. 您可以创建一个特殊的类文件对象,该对象将流式传输到客户端。 Here is a quick & dirty implementation using a queue: 这是使用队列的快速和脏实现:

from queue import Queue

class StreamWriter(object):
    def __init__(self):
        self.queue = Queue()

    def write(self, str):
        self.queue.put(str)

    def read(self):
        str = self.queue.get()
        self.queue.task_done()
        if str == '~':
            return None
        return str

    def close(self):
        self.write('~')  # indicate EOF

This is nothing more than a pub-sub type queue. 这只不过是pub-sub类型队列。 The read() method will block until there is something written in another thread. read()方法将阻塞,直到在另一个线程中写入某些内容。

Now you can stream a response using a generator. 现在,您可以使用生成器传输响应。 The following example shows a generator that takes the serialization function as an argument. 以下示例显示了将序列化函数作为参数的生成器。 The serialization function is executed in a background thread and receives the file-like object as an argument. 序列化函数在后台线程中执行,并接收类文件对象作为参数。

def generate_response(serialize):
    file = StreamWriter()
    def serialize_task():
        serialize(file)
        file.close()
    threading.Thread(target=serialize_task).start()
    while True:
        chunk = file.read()
        if chunk is None:
            break
        yield chunk

I hope this helps! 我希望这有帮助!

If I understood you well, you want 如果我理解你,你想要

  • Flask web app providing stream of data Flask Web应用程序提供数据流
  • Client get the data piece by piece, not in one large chunk 客户端逐个获取数据,而不是一个大块
  • Flask web app being in control, thus initiating the write app. Flask Web应用程序处于控制状态,从而启动了write应用程序。

I think, this cannot be done as someone must be in control of the flow and in case of web app it is client, who is reading the data. 我认为,这是不可能的,因为有人必须控制流程,而在Web应用程序中,它是客户端,正在读取数据。

On the other hand, if you want prevent buffering whole content to be provided to the client on web app, you can read the data on web server piece by piece and yield it part by part. 另一方面,如果您希望阻止缓存整个内容在Web应用程序上提供给客户端,您可以逐个读取Web服务器上的数据并逐个产生。

Providing content by server piece by piece 逐个服务器提供内容

from flask import Flask, Response, request
app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello World!'

@app.route('/loop')
def loop():
    def generate():
        yield "Hello"
        yield "World"
    return Response(generate())

@app.route('/longloop/<int:rows>')
def longloop(rows):
    def generate(rows):
        for i in xrange(rows):
            yield "{i}: Hello World".format(i=i)
    return Response(generate(rows))

if __name__ == '__main__':
    app.run(debug=True)

The trick is to use Response object with generator generating output. 诀窍是使用具有生成器生成输出的Response对象。

If you visit http://localhost:5000/longloop/100 , you shall receive 100 greeting. 如果你访问http://localhost:5000/longloop/100 ,你将收到100个问候语。

Try this from command line using curl and better redirect out put to /dev/null : 使用curl从命令行尝试这个,并更好地重定向输出到/dev/null

$ curl -X GET http://localhost:5000/longloop/120000000 > /dev/null                                                                                                                              
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  538M    0  538M    0     0  1056k      0 --:--:--  0:08:41 --:--:-- 1079k

As we can see, the script runs more than 8 minutes now and memory consumed by flask app is still the same about the same, in my case it keeps at 0.4% of total RAM. 我们可以看到,脚本现在运行超过8分钟,烧瓶app消耗的内存大致相同,在我的情况下它保持在总RAM的0.4%。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM