I searched for a good pattern to implement here, and couldn't find anything.
First, I have multiple nodes in a cluster subscribing to a topic. Because I am interfacing with an external API, I cannot change this topic to a queue (which would solve my problems). When a message goes into this topic, the subscribers react, but I need to ensure that only one subscriber actually does any work.
I have multiple nodes for durability and for scalability. I thought about just electing a master node, but over time there will be multiple topics, and I do not want to make only one node responsible for all messages all the time. Hazelcast is not a requirement here.
@Named
public class MessageProcessorImpl
implements MessageProcessor
{
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
private final Lock lock;
private final List<Message> messageListCache;
private final IAtomicLong cachePositionCounter;
private final Long maximumRecentlyProcessedCachedSize = 10L;
private static final Logger logger = LoggerFactory.getLogger(MessageProcessorImpl.class);
private final ExternalMessageService externalMessageService;
@Inject
public MessageProcessorImpl(final ExternalMessageService externalMessageService)
{
lock = hazelcastInstance.getLock("test-lock");
messageListCache = hazelcastInstance.getList("test-list");
cachePositionCounter = hazelcastInstance.getAtomicLong("test-atomic-long");
this.externalMessageService = externalMessageService;
}
@Override
public void processMessage(final Message message) {
try {
logger.trace("Acquiring lock");
lock.lock();
if (!messageListCache.contains(message)) {
Long currentIndex = cachePositionCounter.getAndIncrement();
if (currentIndex >= maximumRecentlyProcessedCachedSize) {
currentIndex = 0L;
cachePositionCounter.set(currentIndex);
}
messageListCache.add(toIntExact(currentIndex), message);
externalMessageService.doSomething(message);
}
}
finally {
logger.trace("releasing lock");
lock.unlock();
}
}
}
As you can see, I am using a list of recently processed message to prevent duplicate work. The problem here is obvious, what if that list is overwhelmed. I could set that cache relatively high, but not infinite so the list doesn't grow forever. Also, there is some overhead to checking whether a message is in a list.
Is there a better solution or a way I could avoid the edge case of that list being overwhelmed and causing duplicate work? I'm not even sure if that's a valid concern, it's difficult to reason about. Is there a different approach I should try?
This answer is very late. However, the pattern that might help in this case is leadership election. That is one node is elected to process a message from the topic while others wait till the message is processed successfully. The leadership changes with each message.
Apache Zooker has facility for distributed lock/leadership election, refer here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.