简体   繁体   中英

Kafka throwing “org.apache.kafka.clients.consumer.CommitFailedException”

I have developed Kafka consumer application using spring-kafka library and used default consumer configurations with Manual commits.

I am running two instances of application listening to two different Kafka topics. While performing load testing I observed that I am getting below error in only one of the application for higher load:

Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. 

    This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, 
    which typically implies that the poll loop is spending too much time message processing. 

    You can address this either by increasing the session timeout or by reducing the maximum size of batches 
    returned in poll() with max.poll.records.


org.apache.kafka.clients.consumer.CommitFailedException: 
    Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. 

    This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, 
    which typically implies that the poll loop is spending too much time message processing. 


You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
    \n org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:725)

I read several articles and found if consumer spending to much time in processing message and broker is not getting information about consumer liveliness then consumer rebalancing happens and above exception will be thrown for uncommited messages.

I have resolved above error by setting max.poll.interval.ms to INEGER.MAX_VALUE. But I am wondering why I am getting above error only in one of instance and why other instance working as expected for higher loads.

Can anyone please share correct root cause and ideal value for max.poll.interval.ms or appropriate solution for this issue

One cause for this could be that your poll() takes a lot of messages, and that's why it takes a lot of time to process all of them.
max.poll.records defines the maximum number of records returned in a single call to poll() .
According to kafka documentation , it's default is 500.
You can try to set that to something smaller, and see if that solves your problem.

Apart from the suggestion by Yoav, if reducing the batch size is not an option, you can also try to increase the value for max.poll.interval.ms . From the Kafka docs:

The maximum delay between invocations of poll() when using consumer group management. This places an upper bound on the amount of time that the consumer can be idle before fetching more records. If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member.

Set max.poll.interval.ms minimum 5000000 in you properties like

consumer.set("max.poll.interval.ms","5000000");
consumer.set("max.poll.records","2");
consumer.set("session.time.out.ms","30000"); 
consumer.set("heartbeat.interval.ms","25000");

Hope that: - it will help you....

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM