简体   繁体   English

使用Python,Pika和AMQP设计异步RPC应用程序的最佳模式是什么?

[英]What's the best pattern to design an asynchronous RPC application using Python, Pika and AMQP?

The producer module of my application is run by users who want to submit work to be done on a small cluster. 我的应用程序的生产者模块由想要提交要在小型集群上完成的工作的用户运行。 It sends the subscriptions in JSON form through the RabbitMQ message broker. 它通过RabbitMQ消息代理以JSON格式发送订阅。

I have tried several strategies, and the best so far is the following, which is still not fully working: 我已经尝试了几种策略,到目前为止最好的是以下,但仍然没有完全发挥作用:

Each cluster machine runs a consumer module, which subscribes itself to the AMQP queue and issues a prefetch_count to tell the broker how many tasks it can run at once. 每个集群计算机都运行一个使用者模块,该模块将自己订阅到AMQP队列并发出prefetch_count来告诉代理一次可以运行多少任务。

I was able to make it work using SelectConnection from the Pika AMQP library. 我能够使用Pika AMQP库中的SelectConnection使其工作。 Both consumer and producer start two channels, one connected to each queue. 消费者和生产者都启动两个通道,一个连接到每个队列。 The producer sends requests on channel [A] and waits for responses in channel [B], and the consumer waits for requests on channel [A] and send responses on channel [B]. 制作者在频道[A]上发送请求并等待频道[B]中的响应,并且消费者等待频道[A]上的请求并在频道[B]上发送响应。 It seems, however, that when the consumer runs the callback that calculates the response, it blocks, so I have only one task executed at each consumer at each time. 但是,似乎当消费者运行计算响应的回调时,它会阻塞,所以每次只有每个消费者执行一个任务。

What I need in the end: 我到底需要什么:

  1. the consumer [A] subscribes his tasks (around 5k each time) to the cluster 消费者[A]将他的任务(每次约5k)订阅到集群
  2. the broker dispatches N messages/requests for each consumer, where N is the number of concurrent tasks it can handle 代理为每个使用者分派N个消息/请求,其中N是它可以处理的并发任务的数量
  3. when a single task is finished, the consumer replies to the broker/producer with the result 当一个任务完成时,消费者用结果回复经纪人/生产者
  4. the producer receives the replies, update the computation status and, in the end, prints some reports 生产者收到回复,更新计算状态,最后打印一些报告

Restrictions: 限制:

  • If another user submits work, all of his tasks will be queued after the previous user (I guess this is automatically true from the queue system, but I haven't thought about the implications on a threaded environment) 如果另一个用户提交工作,他的所有任务将在前一个用户之后排队(我猜这是从队列系统自动生效的,但我没有想到对线程环境的影响)
  • Tasks have an order to be submitted, but the order they are replied is not important 任务有一个订单要提交,但他们回复的订单并不重要

UPDATE UPDATE

I have studied a bit further and my actual problem seems to be that I use a simple function as callback to the pika's SelectConnection.channel.basic_consume() function. 我已经进一步研究了一下,我的实际问题似乎是我使用一个简单的函数作为回调到pika的SelectConnection.channel.basic_consume()函数。 My last (unimplemented) idea is to pass a threading function, instead of a regular one, so the callback would not block and the consumer can keep listening. 我的最后一个(未实现的)想法是传递线程函数,而不是常规函数,因此回调不会阻塞,消费者可以继续监听。

As you have noticed, your process blocks when it runs a callback. 正如您所注意到的,您的进程在运行回调时会阻塞。 There are several ways to deal with this depending on what your callback does. 有几种方法可以解决这个问题,具体取决于你的回调功能。

If your callback is IO-bound (doing lots of networking or disk IO) you can use either threads or a greenlet-based solution, such as gevent , eventlet , or greenhouse . 如果您的回调是IO绑定的(执行大量网络或磁盘IO),您可以使用线程或基于greenlet的解决方案,例如geventeventletgreenhouse Keep in mind, though, that Python is limited by the GIL (Global Interpreter Lock), which means that only one piece of python code is ever running in a single python process. 但请记住,Python受GIL(全局解释器锁)限制,这意味着在单个python进程中只运行一段python代码。 This means that if you are doing lots of computation with python code, these solutions will likely not be much faster than what you already have. 这意味着如果你使用python代码进行大量计算,这些解决方案可能不会比你已经拥有的更快。

Another option would be to implement your consumer as multiple processes using multiprocessing . 另一种选择是使用处理将您的使用者实现为多个进程。 I have found multiprocessing to be very useful when doing parallel work. 我发现多处理在进行并行工作时非常有用。 You could implement this by either using a Queue , having the parent process being the consumer and farming out work to its children, or by simply starting up multiple processes which each consume on their own. 您可以通过使用Queue ,将父进程作为使用者并将其工作分配给其子进程,或者通过简单地启动多个进程来实现这一点,每个进程自己消耗。 I would suggest, unless your application is highly concurrent (1000s of workers), to simply start multiple workers, each of which consumes from their own connection. 我建议,除非您的应用程序是高度并发的(1000名工作人员),否则只需启动多个工作程序,每个工作程序都使用自己的连接。 This way, you can use the acknowledgement feature of AMQP, so if a consumer dies while still processing a task, the message is sent back to the queue automatically and will be picked up by another worker, rather than simply losing the request. 这样,您可以使用AMQP的确认功能,因此如果消费者在处理任务时死亡,则消息将自动发送回队列,并由另一个工作人员接收,而不是简单地丢失请求。

A last option, if you control the producer and it is also written in Python, is to use a task library like celery to abstract the task/queue workings for you. 如果您控制生产者并且它也是用Python编写的话,最后一个选项是使用像celery这样的任务库来为您抽象任务/队列工作。 I have used celery for several large projects and have found it to be very well written. 我已经将芹菜用于几个大型项目,并且发现它写得非常好。 It will also handle the multiple consumer issues for you with the appropriate configuration. 它还将通过适当的配置为您处理多个消费者问题。

Your setup sounds good to me. 你的设置听起来不错。 And you are right, you can simply set the callback to start a thread and chain that to a separate callback when the thread finishes to queue the response back over Channel B. 你是对的,你可以简单地设置回调来启动一个线程,并在线程完成通过通道B将响应排队回来时将其链接到单独的回调。

Basically, your consumers should have a queue of their own (size of N, amount of parallelism they support). 基本上,您的消费者应该拥有自己的队列(N的大小,他们支持的并行数量)。 When a request comes in via Channel A, it should store the result in the queue shared between the main thread with Pika and the worker threads in the thread pool. 当请求通过通道A进入时,它应该将结果存储在主线程与Pika和线程池中的工作线程之间共享的队列中。 As soon it is queued, pika should respond back with ACK, and your worker thread would wake up and start processing. 一旦排队,pika应该用ACK回复,你的工作线程将被唤醒并开始处理。

Once the worker is done with its work, it would queue the result back on a separate result queue and issue a callback to the main thread to send it back to the consumer. 一旦工作人员完成其工作,它将把结果排队回一个单独的结果队列,并向主线程发出回调以将其发送回消费者。

You should take care and make sure that the worker threads are not interfering with each other if they are using any shared resources, but that's a separate topic. 您应该注意并确保工作线程在使用任何共享资源时不会相互干扰,但这是一个单独的主题。

Being unexperienced in threading, my setup would run multiple consumer processes (the number of which basically being your prefetch count). 由于没有线程经验,我的设置会运行多个消费者进程(其数量基本上是您的预取计数)。 Each would connect to the two queues and they would process jobs happily, unknowning of eachother's existence. 每个人都会连接到两个队列,他们会愉快地处理工作,不知道彼此的存在。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM