Consumer CommitFailedException - 后续调用 poll() 之间的时间长于配置的 max.poll.interval.ms

Question

I have problem with kafka consumer which from time to time throws exception.我对卡夫卡消费者有问题，它不时抛出异常。

ERROR [*KafkaConsumerWorker] (Thread-125) [] Kafka Consumer thread 235604751 Exception while polling Kafka.: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:820) [kafka-clients-2.3.0.jar:]
    at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:692) [kafka-clients-2.3.0.jar:]
    at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1368) [kafka-clients-2.3.0.jar:]
    at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1330) [kafka-clients-2.3.0.jar:]
    at *.kafka.KafkaConsumerWorker.run(KafkaConsumerWorker.java:64) [classes:]
    at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_51]

I can't find why is this happening, because the consumer is not processing any messages, while this exception occurs.我找不到为什么会发生这种情况，因为消费者没有处理任何消息，而发生此异常。 These exceptions occurred 2 - 3 times daily.这些异常每天发生 2-3 次。 Some of my consumer configurations are as follow:我的一些消费者配置如下：

allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = latest
bootstrap.servers = [*]
check.crcs = true
client.dns.lookup = default
client.id = 52c94040-05d9-4b57-8006-afcc862f9b62
client.rack = 
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = TEST
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 10
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50

Implementation:执行：

  {
        logger.info("Kafka Consumer thread {} start", hashCode());
        Consumer<String, Message> consumer = null;


        try {
            consumer = KafkaConsumerClient.createConsumer();

            while (start) {
                try {
                    ConsumerRecords<String, Message> notifications =
                        consumer.poll(300000);

                    if (!notifications.isEmpty()) {
                        //processing.....
                    }

                    consumer.commitSync();
                } catch (Exception e) {
                    logger.error("Kafka Consumer thread {} Exception while polling Kafka.", hashCode(), e);
                }
            }
            logger.info("Kafka Consumer thread {} exit", hashCode());
        } finally {
            if (consumer != null) {
                logger.info("Kafka Consumer thread {}  closing consumer.", hashCode());
                consumer.close();
            }
        }
    }

I know that with this version of the kafka clinet, the heartbeatis sent from another thread which I guess that eliminates that the consumer spent too much time for processing (even that there is nothing to process).我知道使用这个版本的 kafka clinet，心跳是从另一个线程发送的，我猜这消除了消费者花费太多时间进行处理（即使没有任何东西需要处理）。 I guess that this is something with config timeoutes but can't find which exactly.我猜这与配置超时有关，但无法准确找到。

Answer 1

Assuming you want to handle records in order, you should append events into an in memory queue from the consumer loop, then hand off that queue object into a completely new dequeue-ing Thread for the processing... logic假设您要按顺序处理记录，您应该将 append 事件从消费者循环中放入 memory 队列中，然后将该队列 object 移交给一个全新的 dequeue-Thread processing...

The error suggests whatever you're doing there is slow enough to stop and rebalance your consumer该错误表明您在那里所做的任何事情都足够慢以停止和重新平衡您的消费者

I'd also recommend a higher level library that can handle backpressure such as the Connect / Streams API or Vertx or Smallrye Messaging or Akka Streams我还推荐一个可以处理背压的更高级别的库，例如 Connect / Streams API 或 Vertx 或 Smallrye Messaging 或 Akka Streams

Answer 2

You should set the Duration of Consumer#poll(Duration) to lower than max.poll.interval.ms which is the maximum of time Consumer can stay idle before fetching more records.您应该将Consumer#poll(Duration)的Duration设置为低于max.poll.interval.ms ，这是Consumer在获取更多记录之前可以保持空闲的最长时间。 In Kafka document :在Kafka 文档中：

 If poll() is not called before expiration of this timeout, then the consumer is considered failed and the group will rebalance in order to reassign the partitions to another member

By the time you you commit your offset, your consumer is already failed, and partition had already revoked, group is rebalancing.当您提交偏移量时，您的消费者已经失败，并且分区已经撤销，组正在重新平衡。

Consumer CommitFailedException - 后续调用 poll() 之间的时间长于配置的 max.poll.interval.ms

问题描述

2 个解决方案

解决方案1
1 2020-06-10 15:03:41

解决方案2
0 2020-06-10 10:21:49

Consumer CommitFailedException - 后续调用 poll() 之间的时间长于配置的 max.poll.interval.ms

问题描述

2 个解决方案

解决方案1 1 2020-06-10 15:03:41

解决方案2 0 2020-06-10 10:21:49

解决方案1
1 2020-06-10 15:03:41

解决方案2
0 2020-06-10 10:21:49