简体   繁体   English

用于轨道的AMQP gem正在重新排列数百条成功处理的消息

[英]AMQP gem for rails is requeuing hundreds of successfully processed messages

Why, when I have a lot of messages on a queue (1200), are my messages being requeued even though my code processes them successfully and "acks" them? 为什么当我的队列中有很多消息(1200条)时,即使我的代码成功处理了它们并“确认”了它们,我的消息仍要重新排队?

AND

How can I fix this? 我怎样才能解决这个问题?

.. ..

I have an application that uses rails amqp gem to make use of RabbitMQ. 我有一个使用rails amqp gem来使用RabbitMQ的应用程序。 We put messages on a queue with information about emails that need to be sent, and the subscriber takes them off and sends them. 我们将消息与需要发送的电子邮件有关的信息放入队列中,然后订户将其取下并发送。

Sometimes hundreds of messages will be placed on the queue in quick succession. 有时,数百个消息将快速连续地放入队列中。

We use acknowledgements to be sure that messages are not lost. 我们使用确认来确保消息不会丢失。

It was working very well until recently when I found that there were 1200 messages on the queue and they were not being consumed. 直到最近,当我发现队列中有1200条消息并且它们没有被使用时,它的性能一直很好。

So why was my consumer not consuming them? 那么为什么我的消费者不消费它们呢?

Looking at the logs I found that yes, it had consumed them and the emails were sent. 查看日志,我发现是的,它已经消耗掉了并且发送了电子邮件。 I restarted the consumer and it reconsumed them, meaning we sent multiples of the same email to users. 我重新启动了使用者并重新使用了它们,这意味着我们向用户发送了同一封电子邮件的倍数。 Yikes! kes! But what I noticed by watching the RabbitMQ UI was that when I restarted the consumer, it took all 1200 messages off the queue at once. 但是,通过查看RabbitMQ UI,我注意到的是,当我重新启动使用者时,它立即清除了队列中的所有1200条消息。 Then after a few minutes, these messages were requeued, even though my consumer was still going through them and sending the emails. 几分钟后,即使我的消费者仍在浏览并发送电子邮件,这些消息仍被重新排队。 In our code, the consumer does ack the message after each email is sent (message processed). 在我们的代码中,消费者在发送每封电子邮件(已处理消息)后确实确认了该消息。

So my best guess at what is happening is that when there are lots of messages on the queue, the consumer takes them all off, but does not ack each one separately and instead waits until all the messages have been processed before doing a mass ack. 因此,我对发生的事情的最佳猜测是,当队列中有很多消息时,使用者将所有消息都删除了,但并没有分别确认每个消息,而是等到所有消息都处理完后再进行批量确认。 As this takes a long time, 10 minutes, something is happening on the RabbitMQ side which says, hey this is taking too long, lets requeue all those messages, even while my consumer is still processing them successfully. 由于这花费了很长时间,即10分钟,因此RabbitMQ端发生了一些事情,说,嘿,这花费了太长时间,即使我的用户仍在成功处理它们,也让所有消息重新排队。

I have looked around a lot and found something called a heartbeat, but I cannot find any clear explanation of what this is and how to use it, if I need to use it at all. 我到处走走,发现了一种叫做心跳的东西,但是,如果我需要使用它,我找不到任何明确的解释,它是什么以及如何使用它。 But it sounds like it could be related to communication between the queue and the consumer and could be the key to not having all those messages requeued while they are being processed. 但是听起来这可能与队列和使用者之间的通信有关,并且可能是在处理消息时不让所有这些消息重新排队的关键。

Another thing I tried was using prefetch: 1. Described here . 我尝试过的另一件事是使用预取:1. 在此处描述。 Though it does not seem appropriate because I only have one consumer. 虽然这似乎不合适,因为我只有一个消费者。 But it sounded hopeful because it looked as though it might force one by one acknowledgement of messages. 但这听起来很有希望,因为它看起来似乎可以一一确认消息。

Should I consider multiple consumers given that we could get hundreds of messages placed on the queue in quick succession? 考虑到我们可以连续快速地将数百条消息放入队列中,我是否应该考虑多个消费者?

Here is my rake task to subscribe to the queue 这是我要订阅队列的rake任务

task :subscribe_basic => :environment do |task_name|
  begin # make sure any exception is logged
    log = Rails.logger
    routing_key = "send_letter"
    tcp_connection_settings =
        {:host=>"localhost",
         :port=>5672,
         :vhost=>"dev_vhost",
         :user=>"dev_user",
         :pass=>"abc123",
         :timeout=>0.3,
         :ssl=>false,
         :on_tcp_connection_loss=>
             handle_conn_loss,
         :logging=>true}

    begin
      ::AMQP.start(tcp_connection_settings) do |connection|
        channel = ::AMQP::Channel.new(connection, :prefetch => 1)
        binding.pry
        channel.auto_recovery = true
        cons = SendLetterConsumer.new channel, log

        queue = channel.queue(routing_key, exclusive: false, durable: true)

        consumer1 = AMQP::Consumer.new(channel, queue, nil, exclusive = false, no_ack = false)
        consumer1.consume.on_delivery(&cons.method(:handle_message))
        log.info "subscribed to queue #{routing_key}, config_key #{config_key} (#{Process.pid})"

        Signal.trap 'INT' do # kill -s INT <pid> , kill -2 <pid>,  Ctrl+C
          log.info "#{task_name} stopping(#{Process.pid})..."
          channel.close { EventMachine.stop } # otherwise segfault
        end
      end
    rescue StandardError => ex
      # 2015-03-20 02:52:49 UTC MQ raised EventMachine::ConnectionError: unable to resolve server address
      log.error "MQ raised #{ex.class.name}: #{ex.message} Backtrace: #{ex.backtrace}"
    end
  rescue Exception => ex
    log.error "#{ex.class.name}: #{ex.message} -- #{ex.backtrace.inspect}"
    raise ex
  end

end

Here is the consumer code we use to handle the message (called in above code: consumer1.consume.on_delivery(&cons.method(:handle_message)) ) : 这是我们用来处理消息的消费者代码(在上面的代码中调用: consumer1.consume.on_delivery(&cons.method(:handle_message)) ):

def handle_message(metadata, payload)
  logger.info "*** SendLetterConsumer#handle_message start #{Time.now}"
  logger.info payload
  begin
    # {course_app: aCourseApplication, errors:[]}
    # {course_app: aFaultyCourseApplication, errors: ['error1', 'error2']}
    msg = JSON.parse(payload)
    ca = CourseApplication.find(msg['course_application_id'])
    am = AutomatedMessage.find(msg['automated_message_id'])
    user_name = msg['user_name']
    if am.present?
      raise "Cannot send a letter for Automated message with id #{am.id} because it does not have an associated message template" if am.message_template.nil?
      logger.info "attempt to send letter for Automated Message with id #{am.id}"
      result = LetterSender::send_letter a_course_application: ca, a_message_template: am.message_template, user_name: user_name
    elsif msg.message_template_id
      mt = MessageTemplate.find(msg.message_template_id)
      result = LetterSender::send_letter a_course_application: ca, a_message_template: mt, user_name: user_name
    end
    if result
      metadata.ack #'ack'-ing will remove the message from the queue - do this even if we created a faultyCourseApp
    else
      logger.error "Could not ack for #{msg}"
    end
  rescue StandardError => e
    logger.error "#{e.message} #{e.backtrace}"
    # do not 'ack' - must be a programming mistake so leave message on queue - keep connection open to cont processing other messages
    # fix bug and restart the rake task to redeliver the unacknowledged messages
  end
  logger.info "*** SendLetterConsumer#handle_message   end #{Time.now}"
end    

prefetch was indeed the answer but the doc I linked to above regarding this says to configure this by using: 预取确实是答案,但是我上面链接到此文档说可以使用以下方法配置它:

channel = AMQP::Channel.new(connection, :prefetch => 1)

but this did not work at all. 但这根本没有用。

I had to do this 我必须这样做

channel    = AMQP::Channel.new(connection)
channel.prefetch(1)

and now it works, dispatching only one message and waiting til it is acked before the next is dispatched. 现在它可以正常工作,只分派一条消息,直到分派另一条消息之前一直等待。

This solution is described here in the rabbitmq tutorial, not the amqp gem. 在Rabbitmq教程(而不是amqp gem)中描述此解决方案。

So what happens if I have only one consumer with prefetch, and it fails to ack a message. 因此,如果我只有一个使用预取功能的使用者,而它无法确认消息,会发生什么情况。 Will messages start piling up? 邮件会开始堆积吗?

YES

So it may be good to have 2 consumers, but then both of those consumers might fail to ack. 因此,最好有2个消费者,但是这两个消费者可能都无法接受。

To deal with this, I am trying reject and requeue. 为了解决这个问题,我正在尝试拒绝并重新排队。 So in my Consumer, if I do not hit the section of code where I ack the message, I use metadata.reject(:requeue=>true) and this puts the message back on the front of the queue. 因此,在我的使用者中,如果我没有点击确认消息的代码部分,则使用metadata.reject(:requeue=>true) ,这会将消息放回队列的最前面。 Yes, that's right, the "front" of the queue - bummer. 是的,没错,队列的“前”-令人讨厌。 This means messages will still pile up as the same failing message is continually dispatched to the one consumer. 这意味着消息仍会堆积,因为相同的失败消息会不断发送给一个使用者。

as the former link above says "When there is only one consumer on a queue, make sure you do not create infinite message delivery loops by rejecting and requeueing a message from the same consumer over and over again." 正如上面的前一个链接所说:“当队列中只有一个使用者时,请确保一次又一次地拒绝并重新排队来自同一使用者的消息,以确保不会创建无限的消息传递循环。”

Why doesn't requeue put it on the end of the queue? 为什么不重新排队将其放在队列的末尾? Wouldn't that be better? 那会更好吗? Still you would get looping messages but at least the new messages would get processed rather than pile up. 仍然会出现循环消息,但至少新消息将得到处理而不是堆积。

So I tried setting the prefetch to more than one... two. 因此,我尝试将预取设置为多个...两个。 But same problem. 但是同样的问题。 as soon as 2 messages are rejected and requeued, my poor old consumer keeps getting these same ones delivered to it, rather than getting ones that it has not rejected to give it a chance to process the backlog of messages. 一旦2条消息被拒绝并重新排队,我可怜的老消费者就会继续将这些消息传递给它,而不是得到那些未被拒绝的消息,以便有机会处理消息积压。

How about multiple consumers? 多个消费者呢? same problem. 同样的问题。 I have 2 consumers that prefetch x messages and metadata.reject(requeue:true) them if something goes wrong. 我有2个使用者,如果发生问题,它们会预取x条消息和metadata.reject(requeue:true) Now if the front 2x messages are causing errors in my consumer, then I get into the same problem of infinite looping messages with messages backing up. 现在,如果前面的2x消息在我的使用者中引起错误,那么我将遇到同样的问题,即无限循环消息和备份消息。 If there are less than 2x messages which consistently fail to be acked on the front of the queue, then the consumers gradually get through the backlog of messages. 如果少于2倍的消息始终无法在队列的最前面被确认,那么使用者将逐渐获得消息的积压。

It seems there is no satisfactory solution. 似乎没有令人满意的解决方案。

Ideally I would like my prefetching consumers (prefetch necessary due to initial problem) to be able to not ack a message that they fail to consume properly, but to also move onto the next message in the queue. 理想情况下,我希望我的预取使用者(由于最初的问题而需要进行预取)能够不确认他们无法正确使用的消息,而且还可以移至队列中的下一条消息。 In other words, leave the bad ones in the unacknowledged messages collection rather than put them back on the queue. 换句话说,将不良消息留在未确认的消息集合中,而不是将它们放回队列中。 Problem is that with prefetch I have to reject them or else everything stops and I have to requeue them or else I lose them. 问题是,使用预取时,我必须拒绝它们,否则一切都将停止,并且必须重新排队,否则我会丢失它们。

One approach might be: in my consumer, when a redelivered message fails to be consumed properly in the code, I will reject it but not requeue it by using metadata.reject() and somehow report this message to a developer, or save it in a failed message table in the db so we can deal with it then. 一种方法可能是:在我的使用者中,当重新传递的消息未能在代码中正确使用时,我将拒绝它,但不会通过使用metadata.reject()将其重新排队,并以某种方式将此消息报告给开发人员,或将其保存在数据库中出现失败的消息表,因此我们可以对其进行处理。 (re redelivered flag metadata.redelivered see here in the "At The Consumer" section) (重新传递了标记metadata.redelivered重新传递请参见“消费者”部分中的此处

It would be wonderful if rabbitmq provided a redelivery count - so I could make the cut off for not requeuing higher but it does not seem to do so, it only provides a redelivered flag. 如果rabbitmq提供了重新交付计数,那就太好了–因此我可以为不重新排队而增加分界点,但似乎没有这样做,它只提供重新交付的标志。

My other answer said that prefetch works to solve the problem, but introduces a new problem, ie with prefetch one must then reject and requeue messages that are failing and this leads to loops due to the fact that reject(requeue:true) puts it on the front of the queue only to be consumed again. 我的另一个回答是说,预取可以解决问题,但是引入了一个新问题,即预取然后必须拒绝并重新排列失败的消息,由于reject(requeue:true)将其放入,这会导致循环队列的前面只能再次使用。 Multiple consumers helps a bit but you can still get into loops. 多个使用者有一定帮助,但是您仍然可以陷入循环。

So in order to use prefetch but put failing messages on the back of the queue, I have found that using a dead-letter-exchange setup works. 因此,为了使用预取但将失败的消息放在队列的后面,我发现使用死信交换设置是可行的。 See this article about it, though it is for C#, but you can see the general idea. 请参阅本文 ,尽管它是针对C#的,但是您可以看到总体思路。 Also see RabbitMQ doc about Dead Letter Exchanges . 另请参阅RabbitMQ文档中有关死信交换的内容

I did not grok it at first so here is my short explanation on using the dead letter exchange for this situation: 我一开始并没有做错,所以这是我针对这种情况使用死信交换的简短说明:

RabbitMq does not do delayed messages, so the idea is to use a retry queue and publish messages that fail in the consumer onto this retry queue. RabbitMq不执行延迟的消息,因此其想法是使用重试队列,并将在使用者中失败的消息发布到此重试队列上。 In turn this retry queue will kill them after a certain time causing them to be put on the end of the main queue. 反过来,此重试队列将在一定时间后将其杀死,从而导致将它们放入主队列的末尾。

  1. consumer tries to consume message. 消费者尝试消费消息。

  2. something goes wrong, or you catch an error, so you do not ack ( metadata.ack ) the message but metadata.reject(requeue:false) and publish to the retry queue. 发生错误,或者您捕获到错误,因此您不对消息进行确认( metadata.ack ),而是对metadata.reject(requeue:false)然后将其发布到重试队列。

With a dead letter exchange configuration for this retry queue what happens is this: 使用此重试队列的死信交换配置,将发生以下情况:

  1. The message sits on the retry queue for time period x (set when creating the retry queue in argument "x-message-ttl" see below) then RabbitMq kills it. 消息在重试队列上停留了时间段x(在参数“ x-message-ttl”中创建重试队列时设置,请参见下文),然后RabbitMq将其杀死。

  2. due to the dead letter exchange setup configured on the retry queue using arguments "x-dead-letter-exchange" and "x-dead-letter-routing-key" (see below) this message automatically goes back onto the back of the main queue. 由于使用参数“ x-dead-letter-exchange”和“ x-dead-letter-routing-key”(请参见下文)在重试队列上配置了死信交换设置,因此该消息自动返回到主窗口的背面队列。

A great thing about this is that the retry queue does not even need any consumers. 一件很棒的事情是,重试队列甚至不需要任何使用者。

Here is some code I put in my consumer to publish to the retry queue 这是我放入使用者中的一些代码,以发布到重试队列中

def publish_to_retry_queue(msg:, metadata:)
  @channel.queue("send_letter.retry", exclusive: false, persistent: true, durable: true,
                 arguments:{"x-dead-letter-exchange" => "dead_letter_exchange",
                            "x-dead-letter-routing-key" => "send_letter",
                            "x-message-ttl" => 25000})
  metadata.reject(requeue: false)
  res = @channel.default_exchange.publish(msg, routing_key: "send_letter.retry", headers: metadata.headers)
  @logger.info "result from publishing to retry queue is"
  @logger.info  res
  res
end

where @channel is the channel that the consumer from the main queue is using. 其中@channel是主队列中使用者使用的通道。 NOTE this requires that you have already setup the exchange called dead_letter_exchange on rabbitmq and added a binding from it to the main queue, in this case it is the send_letter queue. 注意,这要求您已经在rabbitmq上设置了名为dead_letter_exchange的交换,并已将其绑定添加到主队列,在本例中为send_letter队列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM