简体   繁体   中英

Apache Flume - Ingesting data from single message queue by multiple consumers

I am currently developing Apache Flume agent that can ingest data from single message queue (Solace). Since message processing is slow due to the size and there will be a lot of messages to ingest, I am thinking of having multiple agents to consume them. However, the challenge will be that multiple agents might take same message resulting the duplicates in the sink (landing bucket). While one agent is processing a message (not acknowledged), if another agent takes a message from a queue then this could happen. Please share if you have similar experience and ideas to solve this issue. Thanks.

You can use a non-exclusive queue to distribute messages between your multiple agents(consumers) in a round-robin fashion.

There won't be any duplicates unless there's some underlying error(such as one of the consumers disconnecting), causing the Solace message broker to re-deliver delivered, but unacknowledged messages to another consumer.

In this case, the JMS message would be marked as re-delivered and your application has to perform some logic to handle this possibly duplicate message.


While not strictly necessary, it's also probably a good idea to think about handling "poison" messages that cannot be successfully handled by the application, and could potentially cause a redeliver-fail,redeliver-fail,redeliver-fail loop.

You can do this by ensuring that your messages are dead-message-queue eligible, configure a dead-message-queue and adjusting the "Max Redelivery" setting on the queue.

This would ensure that "poison" messages would be eventually be moved to the dead-message-queue instead of continuously being redelivered to your application.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM