简体繁体 English

许多生产者，使用python / mod_wsgi的单一消费者

[英]Many producer, single consumer with python/mod_wsgi

原文 2010-03-12 13:21:34 0 2 python/ apache/ concurrency/ synchronization/ producer

I have a Pylons web application served by Apache (mod_wsgi, prefork). 我有一个由Apache服务的Pylons Web应用程序（mod_wsgi，prefork）。 Because of Apache, there are multiple separate processes running my application code concurrently. 由于使用Apache，因此有多个单独的进程同时运行我的应用程序代码。 Some of the non-critical tasks that the application does I want to defer for processing in background to improve "live" response times. 我希望应用程序在后台执行一些非关键任务，以缩短“实时”响应时间。 So I'm thinking of task queue, many Apache processes adding tasks to this queue, a single separate Python process processing them one-by-one and removing from queue. 因此，我想到了任务队列，许多Apache进程将任务添加到此队列中，一个单独的Python进程一个接一个地处理它们并从队列中删除。

The queue should preferably be persisted to disk so queued unprocessed tasks are not lost because of power outage, server restart etc. The question is what would be a reasonable way to implement such queue ? 最好将队列保留在磁盘上，这样不会因断电，服务器重启等原因而丢失未排队的任务。问题是实现这种队列的合理方法是什么？

As for the things I've tried: I started with simple SQLite database and single table in it for storing queue items. 至于我尝试过的事情：我从简单的SQLite数据库和其中的单个表开始存储队列项目。 In load testing, when increasing level of concurrency, I started getting "database locked" errors, as expected. 在负载测试中，当并发级别提高时，我开始按预期收到“数据库锁定”错误。 The quick'n'dirty fix was to replace SQLite with MySQL--it handles concurrency issues well but feels like an overkill for the simple thing I need to do. 快速的“肮脏”解决方案是将MySQL替换为SQLite，它可以很好地处理并发问题，但是对于我需要做的简单事情来说，这似乎有点过头了。 Queue-related DB operations also show up prominently in my profiling reports. 与队列相关的数据库操作也显示在我的性能分析报告中。

2 个解决方案

A message broker like Apache's ActiveMQ is an ideal solution here. 像Apache的ActiveMQ这样的消息代理是这里的理想解决方案。

The pipeline could be following: 管道可能如下：

Application process that is responsible for handling HTTP requests generates replies quickly and sends low-priority, heavy tasks to AMQ queue. 负责处理HTTP请求的应用程序进程会快速生成答复，并将低优先级的繁重任务发送到AMQ队列。
One or more another processes are subscribed to consume AMQ queue and do what is intended to do with these heavy tasks. 订阅一个或多个其他进程以使用AMQ队列，并执行这些繁重的任务。

The requirement of queue persistence is fulfilled out of the box since ActiveMQ stores messages that are not yet consumed in persistent storage. 由于ActiveMQ将尚未使用的消息存储在持久性存储中，因此开箱即用即可满足队列持久性的要求。 Furthermore it scales quite well since you're free to deploy multiple HTTP-apps, multiple consumer apps and AMQ itself on different machines each. 此外，它可以很好地扩展，因为您可以自由地在多个不同的计算机上部署多个HTTP应用程序，多个消费者应用程序以及AMQ本身。

We use something like this in our project written in Python utilizing STOMP as underlying communication protocol. 我们在使用STOMP作为底层通信协议的Python编写的项目中使用了类似的内容。

A web server (any web server) is multi-producer, single-consumer process. Web服务器（任何Web服务器）是多生产者，单消费者的过程。

A simple solution is to build a wsgiref or Werkzeug backend server to handle your backend requests. 一个简单的解决方案是构建一个wsgiref或Werkzeug后端服务器来处理您的后端请求。

Since this "backend" server is build using WSGI technology, it's very, very similar to the front-end web server. 由于此“后端”服务器是使用WSGI技术构建的，因此它与前端Web服务器非常非常相似。 Except. 除了。 It doesn't produce HTML responses (JSON is usually simpler). 它不会产生HTML响应（JSON通常更简单）。 Other than that, it's very straightforward. 除此之外，它非常简单。

You design RESTful transactions for this backend. 您为此后端设计RESTful事务。 You use all of the various WSGI features for URI parsing, authorization, authentication, etc. You -- generally -- don't need session management, since RESTful servers don't usually offer sessions. 您将所有各种WSGI功能用于URI解析，授权，身份验证等。通常，您不需要会话管理，因为RESTful服务器通常不提供会话。

If you get into serious scalability issues, you simply wrap your backend server in lighttpd or some other web engine to create a multi-threaded backend. 如果遇到严重的可伸缩性问题，只需将后端服务器包装在lighttpd或其他Web引擎中即可创建多线程后端。