當節點啟動並運行時，Kafka主題具有帶有Leader = -1（Kafka Leader Election）的分區

Question

我有3個成員的kafka-cluster設置， __consumer_offsets主題有50個分區。

以下是describe命令的結果：

root@kafka-cluster-0:~# kafka-topics.sh --zookeeper localhost:2181 --describe
Topic:__consumer_offsets    PartitionCount:50   ReplicationFactor:1 Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
    Topic: __consumer_offsets   Partition: 0    Leader: 1   Replicas: 1 Isr: 1
    Topic: __consumer_offsets   Partition: 1    Leader: -1  Replicas: 2 Isr: 2
    Topic: __consumer_offsets   Partition: 2    Leader: 0   Replicas: 0 Isr: 0
    Topic: __consumer_offsets   Partition: 3    Leader: 1   Replicas: 1 Isr: 1
    Topic: __consumer_offsets   Partition: 4    Leader: -1  Replicas: 2 Isr: 2
    Topic: __consumer_offsets   Partition: 5    Leader: 0   Replicas: 0 Isr: 0
    ...
    ...

成員是節點0、1和2。

顯而易見， 副本= 2中的分區沒有為其設置領導者 ，並且他們的領導者= -1

我想知道是什么原因導致了這個問題，所以我重新啟動了第二個成員kafka服務，但我從沒想過它會產生這種副作用。

同樣，現在，所有節點都已經運行了幾個小時，這是ls broker / ids的結果：

/home/kafka/bin/zookeeper-shell.sh localhost:2181 <<< "ls /brokers/ids"
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is disabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[0, 1, 2]

另外，集群中有很多主題， 節點2並不是其中任何一個的領導者，並且只要它只有數據（replication-factor = 1，並在該節點上托管分區）的任何地方， leader = -1都是顯而易見的從下面。

Here, node 2 is in the ISR, but never a leader, since replication-factor=2.
Topic:upstream-t2   PartitionCount:20   ReplicationFactor:2 Configs:retention.ms=172800000,retention.bytes=536870912
    Topic: upstream-t2  Partition: 0    Leader: 1   Replicas: 1,2   Isr: 1,2
    Topic: upstream-t2  Partition: 1    Leader: 0   Replicas: 2,0   Isr: 0
    Topic: upstream-t2  Partition: 2    Leader: 0   Replicas: 0,1   Isr: 0
    Topic: upstream-t2  Partition: 3    Leader: 0   Replicas: 1,0   Isr: 0
    Topic: upstream-t2  Partition: 4    Leader: 1   Replicas: 2,1   Isr: 1,2
    Topic: upstream-t2  Partition: 5    Leader: 0   Replicas: 0,2   Isr: 0
    Topic: upstream-t2  Partition: 6    Leader: 1   Replicas: 1,2   Isr: 1,2


Here, node 2 is the only partition some chunks of data are hosted on, but leader=-1.
Topic:upstream-t20  PartitionCount:10   ReplicationFactor:1 Configs:retention.ms=172800000,retention.bytes=536870912
    Topic: upstream-t20 Partition: 0    Leader: 1   Replicas: 1 Isr: 1
    Topic: upstream-t20 Partition: 1    Leader: -1  Replicas: 2 Isr: 2
    Topic: upstream-t20 Partition: 2    Leader: 0   Replicas: 0 Isr: 0
    Topic: upstream-t20 Partition: 3    Leader: 1   Replicas: 1 Isr: 1
    Topic: upstream-t20 Partition: 4    Leader: -1  Replicas: 2 Isr: 2

非常感謝您提供有關如何修復未當選領導人的幫助。

同樣，很高興知道這可能會對我的經紀人的行為產生任何潛在的影響。

編輯-

Kafka版本：1.1.0（2.12-1.1.0）可用空間，例如800GB的可用磁盤。 日志文件非常正常，在節點2上，下面是日志文件的最后10行。 請讓我知道我是否有特別需要尋找的東西。

[2018-12-18 10:31:43,828] INFO [Log partition=upstream-t14-1, dir=/var/lib/kafka] Rolled new log segment at offset 79149636 in 2 ms. (kafka.log.Log)
[2018-12-18 10:32:03,622] INFO Updated PartitionLeaderEpoch. New: {epoch:10, offset:6435}, Current: {epoch:8, offset:6386} for Partition: upstream-t41-8. Cache now contains 7 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-12-18 10:32:03,693] INFO Updated PartitionLeaderEpoch. New: {epoch:10, offset:6333}, Current: {epoch:8, offset:6324} for Partition: upstream-t41-3. Cache now contains 7 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-12-18 10:38:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 10:40:04,831] INFO Updated PartitionLeaderEpoch. New: {epoch:10, offset:6354}, Current: {epoch:8, offset:6340} for Partition: upstream-t41-9. Cache now contains 7 entries. (kafka.server.epoch.LeaderEpochFileCache)
[2018-12-18 10:48:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 10:58:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
[2018-12-18 11:05:50,770] INFO [ProducerStateManager partition=upstream-t4-17] Writing producer snapshot at offset 3086815 (kafka.log.ProducerStateManager)
[2018-12-18 11:05:50,772] INFO [Log partition=upstream-t4-17, dir=/var/lib/kafka] Rolled new log segment at offset 3086815 in 2 ms. (kafka.log.Log)
[2018-12-18 11:07:16,634] INFO [ProducerStateManager partition=upstream-t4-11] Writing producer snapshot at offset 3086497 (kafka.log.ProducerStateManager)
[2018-12-18 11:07:16,635] INFO [Log partition=upstream-t4-11, dir=/var/lib/kafka] Rolled new log segment at offset 3086497 in 1 ms. (kafka.log.Log)
[2018-12-18 11:08:15,803] INFO [ProducerStateManager partition=upstream-t4-5] Writing producer snapshot at offset 3086616 (kafka.log.ProducerStateManager)
[2018-12-18 11:08:15,804] INFO [Log partition=upstream-t4-5, dir=/var/lib/kafka] Rolled new log segment at offset 3086616 in 1 ms. (kafka.log.Log)
[2018-12-18 11:08:38,554] INFO [GroupMetadataManager brokerId=2] Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)

編輯2 ----

好吧，我已經停止了領導者Zookeeper實例 ，現在第二個Zookeeper實例當選為領導者！ 有了這個，未選擇的領導者問題現在就解決了！

我也不知道可能出了什么問題，因此非常歡迎任何有關“ 為什么更換動物園管理員領導者解決未選定的領導者問題 ”的想法！

謝謝！

Answer 1

盡管根本原因尚未查明，但詢問者似乎確實找到了解決方案：

我已經停止了領導者Zookeeper實例，現在將第二個Zookeeper實例選舉為領導者！ 有了這個，未選擇的領導者問題現在就解決了！

當節點啟動並運行時，Kafka主題具有帶有Leader = -1（Kafka Leader Election）的分區

問題描述

1 個解決方案

解決方案1
0 2019-07-15 11:32:38

當節點啟動並運行時，Kafka主題具有帶有Leader = -1（Kafka Leader Election）的分區

問題描述

1 個解決方案

解決方案1 0 2019-07-15 11:32:38

解決方案1
0 2019-07-15 11:32:38