简体   繁体   English

RabbitMQ集群重启时的Spring Integration问题

[英]Spring Integration issue on RabbitMQ cluster restart

We have several RabbitMQ queues in our system and we use Spring Integration amqp:inbound-channel-adapter to consume the messages. 我们的系统中有几个RabbitMQ队列,我们​​使用Spring Integration amqp:inbound-channel-adapter来消耗消息。 The Spring application runs on 5 JBoss nodes (not in cluster) Spring应用程序在5个JBoss节点上运行(不在集群中)

RabbitMQ side is a two clustered nodes with a load balancer, with durable queues, on the application side listeners definition is quite simple with a connection factory defined as follows: RabbitMQ端是两个带有负载均衡器的集群节点,具有持久队列,在应用程序端,侦听器的定义非常简单,连接工厂的定义如下:

<rabbit:connection-factory id="amqpConnectionFactory" username="${orts.rabbitmq.username}" password="${orts.rabbitmq.password}"
host="${orts.rabbitmq.endpoint}" />

and several inbound-channel-adapter defined like the following: 以及几个定义如下的inbound-channel-adapter:

<amqp:inbound-channel-adapter id="artiqAmqpInboundChannelAdapter"
  channel="artiq.queued.action.filter.outbound.channel" error-channel="artiq.recovery.router.channel"
connection-factory="amqpConnectionFactory" header-mapper="amqpHeaderMapper"
  queue-names="ortsArtiqQueue" />

We had experienced an unexpected behavior when for some reason (ie deploying a new configuration) we have to restart the RabbitMQ cluster, after restart it happens that one or more of the listeners stop consuming messages and we have to restart JBoss nodes to recover. 当由于某种原因(例如,部署新配置)而不得不重新启动RabbitMQ集群时,我们遇到了意外的行为。重新启动后,碰巧一个或多个侦听器停止使用消息,并且我们必须重新启动JBoss节点以进行恢复。

Note that this behavior is not bound to a specific queue, each time the impacted queues may be different. 请注意,每次受影响的队列可能不同时,此行为并不绑定到特定的队列。 Also note that the new configuration deployed doesn't modify any of the existing queues (it happened for example when we added new queues) 另请注意,部署的新配置不会修改任何现有队列(例如,当我们添加新队列时发生)

the listeners stop consuming messages and we have to restart JBoss nodes to recover. 侦听器停止使用消息,我们必须重新启动JBoss节点以进行恢复。

In my experience such problems are invariably because the listener container thread is "stuck" in some code downstream of the adapter. 以我的经验,这些问题总是会出现,因为侦听器容器线程被“塞住”了适配器下游的某些代码。

To debug, next time it happens take a thread dump (eg with jstack ) and look at what the consumer threads are doing. 要进行调试,下一次它会发生一个线程转储(例如,使用jstack ),并查看使用者线程在做什么。

It doesn't sound like this is your problem, but we did recently fix a bug which caused a similar problem when adding/removing queues to/from an existing listener container. 听起来这不是您的问题,但是最近我们确实修复了一个错误该错误在向现有侦听器容器添加队列或从现有侦听器容器中删除队列时引起了类似的问题。 If you are not doing that, then that fix won't help you; 如果您不这样做,那么该修补程序将无济于事。 you need to look at the thread dump to see what's happening. 您需要查看线程转储以了解发生了什么。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM