简体   繁体   English

在与Gunicorn一起部署时,如何在Flask方法之间共享内存资源

[英]How to share in memory resources between Flask methods when deploying with Gunicorn

I have implemented a simple microservice using Flask, where the method that handles the request calculates a response based on the request data and a rather large datastructure loaded into memory. 我使用Flask实现了一个简单的微服务,其中处理请求的方法根据请求数据和加载到内存中的相当大的数据结构计算响应。 Now, when I deploy this application using gunicorn and a large number of worker threads, I would simply like to share the datastructure between the request handlers of all workers. 现在,当我使用gunicorn和大量工作线程部署此应用程序时,我只想在所有工作者的请求处理程序之间共享数据结构。 Since the data is only read, there is no need for locking or similar. 由于只读取数据,因此不需要锁定或类似。 What is the best way to do this? 做这个的最好方式是什么?

Essentially what would be needed is this: 基本上需要的是:

  • load/create the large data structure when the server is initialized 初始化服务器时加载/创建大型数据结构
  • somehow get a handle inside the request handling method to access the data structure 以某种方式获取请求处理方法内的句柄来访问数据结构

As far as I understand gunicorn allows me to implement various hook functions, eg for the time the server gets initialized, but a flask request handler method does not know anything about the gunicorn server data structure. 据我所知,gunicorn允许我实现各种钩子函数,例如在服务器初始化时,但是烧瓶请求处理程序方法对gunicorn服务器数据结构一无所知。

I do not want to use something like redis or a database system for this, since all data is in a datastructure that needs to be loaded in memory and no deserialization must be involved. 我不想为此使用像redis或数据库系统这样的东西,因为所有数据都在需要加载到内存中的数据结构中,并且不必涉及反序列化。

The calculation carried out for each request which uses the large data structure can be lengthy so it must happen concurrently in a truly independent thread or process for each request (this should scale up by running on a multi-core computer). 为每个使用大数据结构的请求执行的计算可能很长,因此必须在每个请求的真正独立的线程或进程中同时发生(这应该通过在多核计算机上运行来扩展)。

You can use preloading . 你可以使用预加载

This will allow you to create the data structure ahead of time, then fork each request handling process. 这将允许您提前创建数据结构,然后分叉每个请求处理过程。 This works because of copy-on-write and the knowledge that you are only reading from the large data structure. 这是因为复制以及您从大型数据结构中读取的知识。

Note: Although this will work, it should probably only be used for very small apps or in a development environment. 注意:虽然这可行,但它应该只用于非常小的应用程序或开发环境。 I think the more production-friendly way of doing this would be to queue up these calculations as tasks on the backend since they will be long-running. 我认为更加生产友好的方式是将这些计算排队作为后端的任务,因为它们将长期运行。 You can then notify users of the completed state. 然后,您可以通知用户已完成的状态。


Here is a little snippet to see the difference of preloading. 这是一个小片段,看看预加载的区别。

# app.py

import flask

app = flask.Flask(__name__)

def load_data():
    print('calculating some stuff')
    return {'big': 'data'}

@app.route('/')
def index():
    return repr(data)

data = load_data()

Running with gunicorn app:app --workers 2 : 使用gunicorn app:app --workers 2运行gunicorn app:app --workers 2

[2017-02-24 09:01:01 -0500] [38392] [INFO] Starting gunicorn 19.6.0
[2017-02-24 09:01:01 -0500] [38392] [INFO] Listening at: http://127.0.0.1:8000 (38392)
[2017-02-24 09:01:01 -0500] [38392] [INFO] Using worker: sync
[2017-02-24 09:01:01 -0500] [38395] [INFO] Booting worker with pid: 38395
[2017-02-24 09:01:01 -0500] [38396] [INFO] Booting worker with pid: 38396
calculating some stuff
calculating some stuff

And running with gunicorn app:app --workers 2 --preload : gunicorn app:app --workers 2 --preload一起运行gunicorn app:app --workers 2 --preload

calculating some stuff
[2017-02-24 09:01:06 -0500] [38403] [INFO] Starting gunicorn 19.6.0
[2017-02-24 09:01:06 -0500] [38403] [INFO] Listening at: http://127.0.0.1:8000 (38403)
[2017-02-24 09:01:06 -0500] [38403] [INFO] Using worker: sync
[2017-02-24 09:01:06 -0500] [38406] [INFO] Booting worker with pid: 38406
[2017-02-24 09:01:06 -0500] [38407] [INFO] Booting worker with pid: 38407

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM