简体   繁体   English

当节点启动并运行时,Kafka主题具有带有Leader = -1(Kafka Leader Election)的分区

[英]Kafka topic has partitions with leader=-1 (Kafka Leader Election), while node is up and running

I have a 3 member kafka-cluster setup, the __consumer_offsets topic has 50 partitions. 我有3个成员的kafka-cluster设置, __consumer_offsets主题有50个分区。

The following is the result of describe command on: 以下是describe命令的结果:

root@kafka-cluster-0:~# kafka-topics.sh --zookeeper localhost:2181 --describe
Topic:__consumer_offsets    PartitionCount:50   ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
    Topic: __consumer_offsets   Partition: 0    Leader: 1   Replicas: 1 Isr: 1
    Topic: __consumer_offsets   Partition: 1    Leader: -1  Replicas: 2 Isr: 2
    Topic: __consumer_offsets   Partition: 2    Leader: 0   Replicas: 0 Isr: 0
    Topic: __consumer_offsets   Partition: 3    Leader: 1   Replicas: 1 Isr: 1
    Topic: __consumer_offsets   Partition: 4    Leader: -1  Replicas: 2 Isr: 2
    Topic: __consumer_offsets   Partition: 5    Leader: 0   Replicas: 0 Isr: 0
    ...
    ...

Member's are node 0, 1 and 2. 成员是节点0、1和2。

As it's obvious, The partitions in replica=2 , have no leader set for them, and their leader=-1 显而易见, 副本= 2中的分区没有为其设置领导者 ,并且他们的领导者= -1

I'm wondering what caused this issue, I restarted the 2nd member kafka service, but I never thought it would have this side effect. 我想知道是什么原因导致了这个问题,所以我重新启动了第二个成员kafka服务,但我从没想过它会产生这种副作用。

Also right now, all nodes have been up for hours, this is the result of ls broker/ids : 同样,现在,所有节点都已经运行了几个小时,这是ls broker / ids的结果:

/home/kafka/bin/zookeeper-shell.sh localhost:2181 <<< "ls /brokers/ids"
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is disabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[0, 1, 2]

Also, there are many topics in the cluster, and node 2 is not leader for any of them, and wherever it only has the data (replication-factor=1, and partition hosted on this node), leader=-1 , as obvious from below. 另外,集群中有很多主题, 节点2并不是其中任何一个的领导者,并且只要它只有数据(replication-factor = 1,并在该节点上托管分区)的任何地方, leader = -1都是显而易见的从下面。

Here, node 2 is in the ISR, but never a leader, since replication-factor=2.
Topic:upstream-t2   PartitionCount:20   ReplicationFactor:2 Configs:retention.ms=172800000,retention.bytes=536870912
    Topic: upstream-t2  Partition: 0    Leader: 1   Replicas: 1,2   Isr: 1,2
    Topic: upstream-t2  Partition: 1    Leader: 0   Replicas: 2,0   Isr: 0
    Topic: upstream-t2  Partition: 2    Leader: 0   Replicas: 0,1   Isr: 0
    Topic: upstream-t2  Partition: 3    Leader: 0   Replicas: 1,0   Isr: 0
    Topic: upstream-t2  Partition: 4    Leader: 1   Replicas: 2,1   Isr: 1,2
    Topic: upstream-t2  Partition: 5    Leader: 0   Replicas: 0,2   Isr: 0
    Topic: upstream-t2  Partition: 6    Leader: 1   Replicas: 1,2   Isr: 1,2


Here, node 2 is the only partition some chunks of data are hosted on, but leader=-1.
Topic:upstream-t20  PartitionCount:10   ReplicationFactor:1 Configs:retention.ms=172800000,retention.bytes=536870912
    Topic: upstream-t20 Partition: 0    Leader: 1   Replicas: 1 Isr: 1
    Topic: upstream-t20 Partition: 1    Leader: -1  Replicas: 2 Isr: 2
    Topic: upstream-t20 Partition: 2    Leader: 0   Replicas: 0 Isr: 0
    Topic: upstream-t20 Partition: 3    Leader: 1   Replicas: 1 Isr: 1
    Topic: upstream-t20 Partition: 4    Leader: -1  Replicas: 2 Isr: 2

Any help with how to fix the leader not being elected is greatly appreciated. 非常感谢您提供有关如何修复未当选领导人的帮助。

Also It's great to know any possible implications this might have on how my brokers behave. 同样,很高兴知道这可能会对我的经纪人的行为产生任何潜在的影响。

EDIT --- 编辑-

Kafka Version: 1.1.0 (2.12-1.1.0) Space is available, like 800GB of free disk. Kafka版本:1.1.0(2.12-1.1.0)可用空间,例如800GB的可用磁盘。 log files are pretty normal, on node 2, below is the last 10 lines of the log file. 日志文件非常正常,在节点2上,下面是日志文件的最后10行。 please let me know if there's anything in particular I should look for. 请让我知道我是否有特别需要寻找的东西。

[2018-12-18 10:31:43,828] INFO [Log partition=upstream-t14-1, dir=/var/lib/kafka] Rolled new log segment at offset 79149636 in 2 ms. (kafka.log.Log)
[2018-12-18 10:32:03,622] INFO Updated PartitionLeaderEpoch. New: {epoch:10, offset:6435}, Current: {epoch:8, offset:6386} for Partition: upstream-t41-8. Cache now contains 7 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-12-18 10:32:03,693] INFO Updated PartitionLeaderEpoch. New: {epoch:10, offset:6333}, Current: {epoch:8, offset:6324} for Partition: upstream-t41-3. Cache now contains 7 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-12-18 10:38:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 10:40:04,831] INFO Updated PartitionLeaderEpoch. New: {epoch:10, offset:6354}, Current: {epoch:8, offset:6340} for Partition: upstream-t41-9. Cache now contains 7 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-12-18 10:48:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 10:58:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 11:05:50,770] INFO [ProducerStateManager partition=upstream-t4-17] Writing producer snapshot at offset 3086815 (kafka.log.ProducerStateManager)
[2018-12-18 11:05:50,772] INFO [Log partition=upstream-t4-17, dir=/var/lib/kafka] Rolled new log segment at offset 3086815 in 2 ms. (kafka.log.Log)
[2018-12-18 11:07:16,634] INFO [ProducerStateManager partition=upstream-t4-11] Writing producer snapshot at offset 3086497 (kafka.log.ProducerStateManager)
[2018-12-18 11:07:16,635] INFO [Log partition=upstream-t4-11, dir=/var/lib/kafka] Rolled new log segment at offset 3086497 in 1 ms. (kafka.log.Log)
[2018-12-18 11:08:15,803] INFO [ProducerStateManager partition=upstream-t4-5] Writing producer snapshot at offset 3086616 (kafka.log.ProducerStateManager)
[2018-12-18 11:08:15,804] INFO [Log partition=upstream-t4-5, dir=/var/lib/kafka] Rolled new log segment at offset 3086616 in 1 ms. (kafka.log.Log)
[2018-12-18 11:08:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)

Edit 2 ---- 编辑2 ----

Well I've stopped the leader zookeeper instance and now the 2nd zookeeper instance is elected as the leader! 好吧,我已经停止了领导者Zookeeper实例 ,现在第二个Zookeeper实例当选为领导者! with this, the un-chosen leader issue is now resolved! 有了这个,未选择的领导者问题现在就解决了!

I don't know what might have gone wrong though, so any idea about " why changing zookeeper leader fixes the un-chosen leader issue " is very much welcome! 我也不知道可能出了什么问题,因此非常欢迎任何有关“ 为什么更换动物园管理员领导者解决未选定的领导者问题 ”的想法!

Thanks! 谢谢!

Though the root cause was never identified, it seems that the asker did find a solution: 尽管根本原因尚未查明,但询问者似乎确实找到了解决方案:

I've stopped the leader zookeeper instance and now the 2nd zookeeper instance is elected as the leader! 我已经停止了领导者Zookeeper实例,现在将第二个Zookeeper实例选举为领导者! with this, the un-chosen leader issue is now resolved! 有了这个,未选择的领导者问题现在就解决了!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM