简体   繁体   English

Kafka主题分区与领导-1

[英]Kafka topic partitions with leader -1

I noticed that few of my kafka topics are behaving in a manner i cannot explain clearly. 我注意到我的卡夫卡主题中很少有人表现得我无法解释清楚。

For eg: 例如:

./kafka-topics.sh --describe --zookeeper ${ip}:2181 --topic test

Topic:test  PartitionCount:3    ReplicationFactor:1 Configs:retention.ms=1209600000
    Topic: test Partition: 0    Leader: 1   Replicas: 1 Isr: 1
    Topic: test Partition: 1    Leader: -1  Replicas: 2 Isr: 2
    Topic: test Partition: 2    Leader: 3   Replicas: 3 Isr: 3

I am particularly concerned about Partition: 1 which shows Leader '-1'. 我特别关注Partition:1显示Leader'-1'。

I also notice that roughly 1/3 of the messages produced to this topic fail due to a 'Timeout'. 我还注意到,由于“超时”,大约1/3的消息产生了这个主题。 This I believe is a consequence of one partition not having a leader. 我认为这是一个分区没有领导者的结果。

I was wondering if anyone has insights into why this issue occurs and how to recover from this in a Production scenario without losing data? 我想知道是否有人深入了解为什么会出现此问题以及如何在不丢失数据的情况下从生产方案中恢复?

EDIT : I am using the librdkafka based python producer; 编辑 :我正在使用基于librdkafka的python生成器; and the error message I see is Message failed delivery: KafkaError{code=_MSG_TIMED_OUT,val=-192,str="Local: Message timed out"} 我看到的错误消息是Message failed delivery: KafkaError{code=_MSG_TIMED_OUT,val=-192,str="Local: Message timed out"}

Most probably your second kafka broker is down. 很可能你的第二个卡夫卡经纪人倒闭了。 In order to check active Kafka brokers you need to run 要检查有效的Kafka经纪商,您需要运行

./zookeeper-shell.sh localhost:2181 <<< "ls /brokers/ids"

And the output should be similar to the one below: 输出应类似于下面的输出:

Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /brokers/ids
[0, 1, 2]
[zk: localhost:2181(CONNECTED) 1]

If the second broker is not listed in the active brokers then you need to figure out why is not up and running (logs should tell you if something went wrong). 如果第二个代理未在活动代理中列出,那么您需要弄清楚为什么没有启动并运行(日志应该告诉您是否出现问题)。 I would also suggest to increase the replication-factor since you have a multi-broker configuration. 我还建议增加复制因子,因为您有多代理配置。

This often indicates that the broker leading that partition is offline. 这通常表示引导该分区的代理处于脱机状态。 I would check the offline partitions metric to confirm this, but also check whether broker 2 is currently functional. 我会检查脱机分区指标以确认这一点,但也要检查代理2当前是否正常运行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM