简体   繁体   中英

How can a kafka consumer doing infinite retires recover from a bad incoming message?

I am kafka newbie and as I was reading the docs, I had this design related question related to kafka consumer.

A kafka consumer reads messages from the kafka stream which is made up of one or more partitions from one or more servers.

Lets say one of the incoming messages is corrupt and as a result the consumer fails to process. But when processing event logs you don't want to drop any events, as a result you do infinite retries to avoid transient errors during processing. In such cases of infinite retries, how can the consumer move forward. Is there a way to blacklist this message for next retry?

I'd think it needs manual intervention. Where we log some message metadata (don't know what exactly yet) to look at which message is failing and have logic in place where each consumer checks redis (or someplace else?) after n reties to see if this message needs to be skipped. The blacklist doesn't have to be stored forever in the redis either, only until the consumer can skip it. Here's a pseudocode of what i just described:

while (errorState) {
       if (msg in blacklist) {
           //skip
           commitOffset()
       } else {      
            errorState = processMessage(msg);       
            if (!errorState) {
                 commitOffset();
            } else {
                 // log this msg so that we can add to blacklist
                 logger.info(msg)
            }
        }
}

I'd like to hear from more experienced folks to see if there are better ways to do this.

We had a requirement in our project where the processing of an incoming message to update a record was dependent on the record being present. Due to some race condition, sometimes update arrived before the insert. In such cases, we implemented couple of approaches.

A. Manual retry with a predefined delay. The code checks if the insert has arrived. If so, processing goes as normal. Otherwise, it would sleep for 500ms, then try again. This would repeat 10 times. At the end, if the message is still not processed, the code logs the message, commits the offset and moves forward. The processing of message is always done in a thread from a pool, so it doesn't block the main thread either. However, in the worst case each message would take 5 seconds of application time.

B. Recently, we refined the above solution to use a message scheduler based on kafka. So now if insert has not arrived before the update, system sends it to a separate scheduler which operates on kafka. This scheduler would replay the message after some time. After 3 retries, we again log the message and stop scheduling or retrying. This gives us the benefit of not blocking the application threads and manage when we would like to replay the message again.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM