代理长时间重新启动后，加载偏移量和元数据块KafkaConsumer

Question

we have the problem that sometimes calls to the 'poll' method of the new KafkaConsumer hangs for as long as 20 to 30 Minutes after one out of three kafka brokers got restartet ! 我们遇到的问题是，在三分之二的kafka经纪人重启后，有时调用新KafkaConsumer的“投票”方法的时间会长达20到30分钟 ！

We are using a 3 broker kafka setup (0.9.0.1). 我们正在使用3个经纪人的kafka设置（0.9.0.1）。 Our Consumer-Processes use the new Java KafkaConsumer-API and we are assigning to specific TopicPartitions. 我们的使用者流程使用新的Java KafkaConsumer-API，并且我们正在分配给特定的TopicPartitions。

for different reasons i can't show the real code here, but basically our code works like this : 由于种种原因，我无法在此处显示真实的代码，但是基本上我们的代码是这样的：

Properties consumerProps=loadConsumerProperties();
// bootstrap.servers=<IP1>:9092,<IP2>:9092,<IP3>:9092
// group.id="consumer_group_gwbc2
// enable.auto.commit=false
// auto.offset.reset=latest
// session.timeout.ms=30000
// key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
// value.deserializer=org.apache.kafka.common.serialization.ByteArrayDeserializer

KafkaConsumer<String, byte[]> consumer = new KafkaConsumer<>(consumerProps);
consumer.assign(Arrays.asList(new TopicPartition("someTopic",0)));

while (true) {

  // THIS CALL sometimes blocks for a very long Time after a broker restart
  ConsumerRecords<String, byte[]> records = kafkaConsumer.poll(200);

  Iterator<ConsumerRecord<String, byte[]>> recordIter = records.iterator();
  while (recordIter.hasNext()) {                
     ConsumerRecord<String, byte[]> record = recordIter.next();

     // Very fast, actually just sending a UDP Paket via Netty.
     processRecord(record); 

     if (lastCommitHappendFiveOrMoreSecondsAgo()) {   
       kafkaConsumer.commitAsync();
     }
  }
}

kafka-topics.sh describes the __consumer_offsets topic as follows kafka-topics.sh描述了__consumer_offsets主题，如下所示

Topic:__consumer_offsets    PartitionCount:50   
ReplicationFactor:3 Configs:segment.bytes=104857600,
cleanup.policy=compact,compression.type=uncompressed

the server.log of the restarted broker shows that loading the offsets from a specific partition of the __consumer_offsets topic takes a long time (in this case about 22 Minutes). 重新启动的代理的server.log显示，从__consumer_offsets主题的特定分区加载偏移需要很长时间（在这种情况下，大约需要22分钟）。 This correlates to the time the 'poll' call of the consumer is blocked. 这与消费者的“轮询”呼叫被阻止的时间相关。

[2016-07-25 16:02:40,846] INFO [Group Metadata Manager on Broker 1]: Loading offsets and group metadata from [__consumer_offsets,15] (kafka.coordinator.GroupMetadataManager)
[2016-07-25 16:25:36,697] INFO [Group Metadata Manager on Broker 1]: Finished loading offsets from [__consumer_offsets,15] in 1375851 milliseconds.

i'am wondering what makes the loading process so slow and what can be done about it !? 我想知道是什么使加载过程如此缓慢，并且该如何处理！

Answer 1

Found the reason. 找到了原因。

the server.xml configuration files for our brokers contain the property 代理的server.xml配置文件包含该属性

log.cleaner.enable=false

(by default this property is true as of version 0.9.0.1) this means that kafkas internal compacted __consumer_offsets topic is not actually compacted since the log-cleaner is disabled. （默认情况下，此属性从0.9.0.1版本开始为true），这意味着由于禁用了日志清除程序，因此实际上并未压缩kafkas内部压缩的__consumer_offsets主题。 in effect some partitions of this topic grew to a size of serveral gigabytes which explains the amount of time needed to read through all of the consumer-offsets data when a new group-coordinator needs to refill it's cache. 实际上，该主题的某些分区增长到服务器GB的大小，这解释了当新的组协调器需要重新填充其缓存时，读取所有消费者偏移数据所需的时间。

代理长时间重新启动后，加载偏移量和元数据块KafkaConsumer

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-07-26 12:01:08

代理长时间重新启动后，加载偏移量和元数据块KafkaConsumer

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-07-26 12:01:08

解决方案1
0 已采纳 2016-07-26 12:01:08